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Abstract 

Model-based reasoning about physical systems has several well-known advantages 
over heuristic expert systems. These include correctness of conclusions, explanations 
of conclusions, ease of modifiability and ease of transfer of expertise to new physical 
systems. On the other hand, reasoning from a model can be slow. This thesis explores 
ways to augment a model-based diagnostic program with a learning component, so 
that it speeds up as it solves problems. 

Several learning components are proposed, each exploiting a different kind of sim- 
ilarity between diagnostic examples. Through analysis and experiments, we explore 
the effect each learning component has on the performance of a model-based diagnos- 
tic program. We also analyze more abstractly the performance effects of Explanation- 
Based Generalization, a technology that is used in several of the proposed learning 
components. 
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Chapter 1 
Introduction 



1.1 Introduction 

Consider a model-based diagnostic engine. Given a structural and behavioral 
description of a device, and a set of observed measurements at certain locations in 
the device, the diagnostic engine identifies components whose misbehavior can explain 
the misbehavior of the device [HD87, Dav84, Gen84], Unfortunately, a model-based 
diagnostic engine does not learn from experience. Given the identical device and 
observations a second time, it repeats the elaborate causal reasoning necessary to 
produce the same diagnosis. Yet, if "similar" problems occur frequently, it is of 
considerable utility for the system to recognize new problems as "the same as" ones 
encountered before, and then jump to the "same" conclusions, skipping the details 
of the reasoning process. 

Consider, for example, the circuit of Figure 1.1(a) with the observations shown. 
A model-based diagnostic engine propagates values through the circuit to detect 
contradictions. In solving the first example, it multiplies 3 and 2 to predict 6 at 
X; it multiplies 3 and 2 to predict 6 at Y; then it adds 6 and 6 to predict 12 at F. 
That prediction of 12 contradicts the observation, so the program concludes that M2, 
Ml or Al must be broken. It then does some additional propagation of values to 
conclude that M2 is not the broken component, and finally outputs Ml and Al as 
the single-fault candidates. That is, either multiplier Ml or adder Al alone could, 
by some misbehavior, account for aU of the observed misbehavior of the circuit. 

Now suppose the program is given the circuit again, this time with the obser- 
vations in Figure 1.1(b). The two cases look quite different on the surface, but the 
answer is the same — Ml and Al are the only consistent candidates — and there 
is another interesting similarity: the diagnostic engine propagates values through 
the same components, in the same order, and detects contradictions in exactly the 
same places. In other words, the diagnostic program performs the same pattern of 
inferences in diagnosing the two cases. 
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Figure 1.1: The Polybox Circuit, designed to calculate AC + BD and CE + BD. 
Ml, M2, and M3 are multipliers. Al and A2 are adders. With either of the sets of 
observations shown. Ml and Al are the only single-fault candidates. 

The augmented diagnostic program described in Chapter 2 can recognize the 
applicability of a pattern of inferences from a previous problem. It does this by 
remembering general preconditions for the patterns of inferences it used in diagnosing 
each case. In diagnosing a new case, it checks each of the remembered general 
preconditions against the observations. Given the two cases described in Figure 1.1, 
the program diagnoses the first, generalizes the patterns of inferences that were useful, 
and, in diagnosing the second case, restricts the candidate set to Ml and Al before 
"looking inside" the circuit. 



1.2 Summary of Contributions 

The learning methods proposed in this thesis all try to extract useful lessons from 
single examples. Those lessons are then used to recognize similarities between new 
examples and previous examples that have already been solved. We ask what kinds 
of similarities can be exploited and what lessons can be extracted that will enable 
recognition of those similarities. Thus, the overall theme is to extract as much useful 
information as possible from single diagnostic examples. 

There are three main contributions of this research: 

• We present many notions of similarity for diagnostic examples. Trying to rec- 
ognize each type of similarity provides an interesting possibility for learning. 

• We analyze the effect on performance of looking for each kind of similarity. 
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• We analyze the strengths and weaknesses of Explanation-Based Generalization 
(EBG), a technology that is used in several learning methods throughout the 
thesis. 

1.3 Multiple Definitions of Similarity 

Each different type of similarity suggests a different set of lessons that a learning 
program should extract from examples. One contribution of this research is to propose 
several definitions of similarity for diagnostic cases and learning components that 
generalize examples based on each of the definitions. The definitions of similarity are 
summarized below. 

Chapter 2 discusses classifying two cases as similar if the same pattern of inferences 
applies to both. For example, as already mentioned, in Figure 1.1 the two sets of 
observations for the device are similar because the observations can be propagated 
through the same components, in the same order, leading to a contradiction at the 
same location. That is, the same pattern of inferences is applicable to both sets of 
observations. 

We can relax: that definition of similarity for sets of device observations if we do not 
require that the sequence of value propagations use exactly the same components, 
but only components that play the same role in the circuit. Section 6.1.1 defines 
similarity for patterns of inferences in terms of equivjJent roles played by components. 
For example, propagating a value through the first bit-slice of a carry-chain adder is 
similar to propagating a value through the second bit- slice. As a result, two sets of 
observations for a carry-chain adder can be defined as similar if "similar" patterns of 
inferences are applicable to them: that is, if propagating values through either the 
first or the second bit-slice leads to a contradiction. 

Chapter 5 discusses classifying two sets of observations as similar if they can be 
caused by the same misbehavior of a particular component. For example, the two 
sets of observations in Figure 1.1 can both be caused by the first bit of Mi's output 
being stuck-at 0. 

Again, we can relax that definition of similarity for sets of device observations 
if we do not require exactly the same component misbehavior to explain the two 
sets of observations, but only a similar misbehavior. Thus, Section 6.2.1 defines two 
misbehaviors for different components as similar if the components play equivalent 
roles in the circuit and the misbehavior is the same. For example, in Figure 1.1, the 
first bit of Mi's output being stuck-at 1 is similar to the first bit of M3's output being 
stuck-at 1. Section 6.2.2 proposes a different definition of similarity for component 
misbehaviors. There, the first bit of Mi's output being stuck-at 1 is defined as 
similar to the first bit of Mi's output being stuck-at 0. Using either of those notions 
of similarity for component misbehaviors, we define two sets of observations as similar 
if they can both be explained by "similar" component misbehaviors. 
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In order to recognize each of the types of similarity described above, a learning 
system has to extract appropriate lessons from the examples it solves. The programs 
described in this thesis use explanation-based methods to construct those lessons 
(which we call generalized rules). Explanation-based methods have two advantages 
over statistical similarity-based methods. First, explanation-based methods use do- 
main knowledge to guide generalization from single cases, while statistical methods 
require numerous cases. This allows the explanation-based learner to learn more 
quickly, based on fewer cases. Second and more important, in contrast to statisti- 
cal similarity-based methods, the explanation-based methods we use do not require 
an inductive bias as to the correct language and form in which to describe classes 
of cases. Instead, the language used to describe the behavior and misbehavior of 
components provides the language to use in classifying device misbehavior. 

1.4 Selecting Useful Definitions of Similarity 

Some notions of similarity are more useful than others for speeding up perfor- 
mance, because there is a cost to looking for similarities. If the benefits gained from 
exploiting similarities in solving new examples do not outweigh the costs of looking 
for the similarities, performance will even deteriorate. Each notion of similarity (e.g., 
same pattern of inferences) gives rise to several generalized rules. Some insight into 
which notions are useful in improving performance can be gained from analyzing 
which individual generalized rules will improve performance. We gain even stronger 
insights by analyzing the aggregate effects of all of the generalized rules. 

1.4.1 Utility of an Individual Generalized Rule 

While we assume that the fixed cost of constructing a generalized rule will be 
amortized over an indefinite number of cases, and hence can be ignored, it is wise to 
examine the costs and benefits of using a generalized rule. We suggest three criteria 
that a generalized rule must satisfy in order to improve the performance of a problem 
solver:^ 

Recurrent Not only must a generalized rule apply to many cases (the traditional 
generality criterion), it must also apply to the cases that the problem-solver 
wiU actually encounter. 

Manifest It must be inexpensive to check whether a generalized rtde applies to a 
new case. 

Exploitable The knowledge that the generalized rule applies to a new case must 
provide some discriminatory power in reasoning about the case. 

^These three factors were also identified independently in [Min88, MCE+87]. 
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Of course, these criteria are not binary predicates: a generalized rule may be 
more or less recurrent, manifest, or exploitable. Consider for a moment why rules 
that do not satisfy these criteria will not be useful in speeding up performance. 
If a generalized rule's applicability can be checked at almost no cost (manifest) and 
provides a complete solution to the cases in which it is applicable (exploitable), but it 
is not applicable to any of the cases that the system is presented with (no recurrence), 
checking that generalization will slow down the system. Similarly, if a generalization 
is applicable in nearly every case, and knowing it is applicable provides a complete 
solution, but it costs more to check the generalized rule than to solve the problem 
from first principles, again, checking the generalized rule only slows down the system. 
Finally, if a generalized rule applies to nearly all of the cases and can be recognized 
at almost no cost, but it provides no discriminatory power in reasoning about the 
case (e.g., a rule that applies to every set of observations), there is no advantage to 
using it. 

The utility of a notion of similarity is just the sum of the utilities of the generalized 
rules it gives rise to. Hence, there are three analogous criteria for evaluating the utility 
of a notion of similarity. First, large numbers of cases that the troubleshooting engine 
is likely to encounter should be similar according to that notion of similarity. Second, 
it should be inexpensive to recognize that kind of similarity. Finally, recognizing that 
the current case is similar to a previous case should help in solving the current case. 

1.4.2 Aggregate Utility 

We can gain further insight into the utility of a notion of similarity by analyzing 
the aggregate effects of all of the generahzed rules. Chapter 3 presents such an aggre- 
gate analysis for the learning system described in Chapter 2. That learning system 
defines two sets of observations as similar if the same derivation of contradictory 
values is applicable to both. Experimental results show that single-fault diagnostic 
speed improved on both the circuit shown in Figure 1.1 and a gate-level implemen- 
tation of a carry-lookahead adder. More importantly, we present a breakdown of the 
operations involved in diagnosis, both with and without the generalized rules. The 
cost breakdown enables rough predictions about how the learning system will affect 
performance on other devices. The cost breakdown also enables predictions about 
how changes to the original diagnostic engine would affect the utility of the learning 
system. 

1.5 Performance Effects of EBG 

Chapter 4 takes a closer look at Explanation-Based Generalization, or EBG 
[MKKC86], a technology used in the learning system of Chapter 2 and several of 
the learning systems described in Chapters 5 and 6. Again, we give an aggregate 
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analysis, viewing the generalized rules constructed by EBG in terms of changes to 
the search strategy of a problem solver. We analyze the sources of power in two com- 
naon uses of EBG to improve performance: generalizing successful problem solving 
episodes, and generalizing the explanations that search nodes are inconsistent (often 
referred to as learning from failure [Min88, MB87, HamST, Paz86]). We summarize 
that analysis below. 

1.5.1 Generalizing Successful Patterns of Inferences 

There are two sources of power in using EBG to generalize successful problem 
solving episodes. First, the generalized rules can bias the problem solver's search 
toward patterns that have been useful in solving previous problems, hence away 
from patterns that have never been useful. Second, the generalized rules encapsulate 
patterns of operator applications: the program can check the preconditions of a 
whole pattern and jump to the conclusions without ever incurring the overhead costs 
of binding variables for the operators and storing intermediate results. The following 
highlights some key observations from our analysis: 

Biasing Search EBG on its own does not capture enough information about the 
frequency with which patterns of inferences are applicable to be able to bias 
search in the most effective way possible. EBG is a technique for learning 
from a single example, but frequencies of applicability are properties of whole 
distributions of examples. 

Encapsulation The benefits from encapsidation depend on the relative cost of eval- 
uating the bodies of search operators versus the cost of binding variables for 
the operators and storing the results of operator applications. 

Caveat: Searching For All Solutions If the problem solver's task is to find all 
of the solution states, using EBG to identify one or a few solution states may 
not reduce the search for the rest of the solution states. 

1.5.2 Generalizing Proofs of Inconsistency 

There are two potential sources of power in using EBG to generalize explanations 
of the inconsistency of search nodes. First, proving the inconsistency of a search 
node may be very expensive; recognizing that a previously successful derivation of 
an inconsistency is applicable may reduce that cost. In this case, generalizing ex- 
planations of failures is the same as generalizing successful patterns of inferences in 
the space of derivations of inconsistencies. Performance may improve due to either 
search reduction or encapsulation, or both. 

Second, knowing the inconsistency of some search nodes may enable the problem 
solver to ignore a large portion of the original search space. The problem solver 



12 CHAPTER 1. INTRODUCTION 

may cut off search either below or above the search nodes that the generalized rules 
identify as inconsistent. 

Cutting Off Below Inconsistent Nodes If goal nodes are never reached from in- 
consistent nodes, the problem solver can cut off search at a node that a gen- 
eralized rule identifies as inconsistent. One must be careful in measuring these 
gains, however, because a well-designed original problem solver may be able to 
cut off search below inconsistent nodes even without the generalized rules. 

Cutting Off Above Inconsistent Nodes The problem solver may be able to com- 
bine information provided by more than one generalized rule to cut off search 
above the nodes that the generalized rules identify as inconsistent. One exam- 
ple of this is the use of the single-fault assumption in diagnosis to intersect sets 
of components that the generalized rules identify as inconsistent. 

1.6 Map of the Thesis 

In summary. Chapters 2, 5, and 6 present several ways that diagnostic exam- 
ples can be thought of as similar, and how single examples can be generalized in 
order to capture those similarities. Since there are costs as well as benefits to us- 
ing generalizations, performance analysis permeates the entire thesis. Chapter 3 in 
particular presents a detailed performance analysis and experimental results for the 
learning program described in Chapter 2. Chapter 4 analyzes the sources of power 
in Explanation-Based Generalization. 



Chapter 2 

Similar = Same Contradiction 
Derivations 



This chapter describes a learning system that remembers useful patterns of inferences 
and checks their applicability to new diagnostic cases. That is, two sets of observa- 
tions for a given circuit are considered similar if certain patterns of inferences are 
applicable to both. The particular patterns that are of interest are those that derive 
contradictions by propagating values through the circuit components. The original 
troubleshooting engine uses derivations of contradictory values to identify "conflict 
sets," sets of components that cannot all be working properly. The learning mecha- 
nism creates a generalized rule from each derivation. The generalized rules are then 
used to check the applicability of the derivations to future sets of observations for 
the same circuit. The generalized rules can speed up diagnosis by identifying con- 
flict sets faster than the original diagnostic engine can identify them through value 
propagation. 

Section 2.1 presents a single-fault candidate generator for model-based diagnostic 
problems.* Section 2.2 presents the learning mechanism, which uses Explanation- 
Based Generalization to encapsulate derivations of contradictory values. Section 2.3 
describes an augmented diagnostic engine that uses the learning mechanism to con- 
struct generalized rules, and then uses the generalized rules in diagnosing future 
cases. Diagnosis of the polybox circuit is used throughout to illustrate the algo- 
rithms. Section 2.4 provides an extended example that demonstrates how the system 
constructs and uses generalized rules in diagnosis of a gate-level implementation of 
a carry-lookahead adder. Chapter 3 presents experimental results that demonstrate 
that the augmented diagnostic algorithm can improve performance. 



^Section 4.2.2 discusses why the technique described here would not speed up the multiple-fault 
candidate generation process used in GDE [dKW87]. 
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Figure 2.1: The first case 



2.1 Candidate Generation 



This section presents the single-fault candidate generation method described in 
[HD87, Dav84, Gen84]. Device structure is described by a list of components and 
their interconnections. Component behavior is modeled by rules for inferring an input 
or output from other inputs and outputs. For example, the behavior of an adder is 
modeled by three rules. The first computes the output by adding the two inputs. 
The other two each compute one of the inputs by subtracting the other input from 
the output. Thus, given values on any two "ports" of the adder, the rules predict a 
value on the remaining port. 

Given observations of the values at the inputs and outputs of a circuit, the candi- 
date generation program first propagates input values through the circuit, using the 
circuit structure and the component behavior rules. For example, in Figure 2.1, the 
behavior rule for multiplier Ml predicts 6 at X from the inputs 3 and 2. If the circuit 
is malfunctioning, predicted values at the outputs will contradict the values observed, 
as in Figure 2.1, where the predicted value of 12 at F contradicts the observed value 
of 10. 

The next question the program asks is: which components were used in deter- 
mining the predicted value at the contradiction site? The program calculates that 
set, called a conflict set, by tracing back through a dependency trail to find those 
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components whose behavior rules were used in predicting the contradicted value.^ 
For instance, in the first example above, (Ml M2 Al) is a conflict set, since Ml, M2, 
and Al are the only components that are needed to predict the value 12 at F. 

Conflict sets are valuable because they restrict the troubleshooter's attention to 
the components that can account for the observed symptoms. If component M3 is 
not in a conflict set (e.g. (Ml M2 Al)), its behavior is irrelevant to the associated 
contradiction: the contradiction wiU exist no matter how the component is behaving. 
M3 may or may not be working, but it cannot explain the contradiction at F. ^ 

The candidate generator described here looks for aU of the single point of failure 
candidates. If a single component is to account for all of the observed misbehavior, 
it must be in every conflict set. Hence, the troubleshooting engine keeps track of the 
intersection of aU the conflict sets found so far, which we term the suspect set. In 
other words, any component in the complement of a conflict set is exonerated. 

Each suspect is then tested by a process called constraint suspension [Dav84]: 
the program assumes that aU of the other components are working but disables the 
suspect's behavior rules (stispends the constraints it places on circuit values). If the 
remaining components can be used to derive a contradiction, the suspect is ruled 
out, since it cannot account for that symptom. In addition the program identifies 
another conflict set, possibly further restricting the suspect set. If no contradiction 
is derived, the suspect is a candidate. 

In the example of Figure 2.2, the initial suspect set is (Ml M2 Al). Constraint 
suspension is performed on M2 (Figure 2.2). Disabling the behavior rules for M2 
resolves the contradiction at F by making it impossible to predict the value 12 there. 
Another contradiction is predicted, however, at Y: Al predicts the value 4 (from 10 
at F and 6 at X), while A2 predicts 6 (from 12 at G and 6 at Z). The new conflict set 
is (Al A2 Ml M3). One of the components in each conflict set must be broken. Thus, 
by the single-fault assumption, the broken component must be in the intersection of 
the conflict sets. Intersecting reduces the suspect set to (Ml Al). In this case, only 
the component which was suspended, M2, is exonerated. If a contradiction is found 
during constraint suspension, the suspended component wiU always be exonerated. 
In general, other components may be exonerated as well when the new conflict set is 
intersected with the previous ones. 

The remaining suspects, Ml and Al, are then tested in turn, but no further 
contradictions are found. Ml and Al are the single-fault candidates. 

Candidate generation is a winnowing process: suspect components that cannot 



^Our tioubleshooter propagates values and recovers the conflict sets by tracing dependency 
records, unlike the ATMS-style diagnosis of [dKW87] that propagates environments and thus builds 
conflict sets as part of the propagation process. Section 3.8 discusses the performance effects from 
learning that would occur using an ATMS implementation of single-fault candidate generation. 

'This assumes that the model we are given is correct; in particular that its topology correctly 
models the connectivity of the circuit. To see what happens when this is not true, see [Dav84]. 
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Figure 2.2: Constraint Suspension on M2 



1. Assume all components are working. Propagate values fcom in- 
puts to outputs to find an initial contradiction, yielding an initial 
suspect set. 

2. While there are still unexamined suspects: 

(a) Choose an unexamined suspect at random. 

(b) Perform constraint suspension on the suspect. 

(c) K a contradiction is found, form conflict set, which reduces 
suspect set. 

(d) If no contradiction is found, add suspect to candidate set. 

(e) Remove suspect from unexamined suspect set. 



Figure 2.3: The Original Diagnostic Algorithm 



account for all of the misbehavior of the device are exonerated. An effective way to 
determine the candidates is to identify the conflict sets. Hence, any process that can 
speed up the identification of conflict sets has the potential to speed up the system's 
performance. 



2.2. THE LEARNING COMPONENT 17 

2.2 The Learning Component 

Now suppose that the program is used repeatedly to diagnose devices with the 
same design description, but is given different sets of observations each time. Almost 
any hardware device seems to have a few weak links that break more frequently 
than the rest of the components of the device. Hence, it is likely that the diagnostic 
engine repeatedly will perform the same sequences of value propagations that predict 
contradictions. That is, the program may be given different observations, but wiU 
frequently predict contradictory values by propagating those observations through 
the same sequence of components. 

This section describes an algorithm that creates a generalized rule from a deriva- 
tion of contradictory values. Once created, the generalized rule can check efficiently 
whether all of the steps of a derivation apply to a new case, without actually perform- 
ing the derivation. When the derivation does apply to a new case, the troubleshooter 
will use the rule to jump to the conclusion, and construct the conflict set without 
propagating values through the circuit. 

2.2.1 Explanation-Based Generalization 

Explanation- Based Generalization (EBG) [MKKC86] is a widely known method 
of using domain knowledge to learn from a single example. EBG constructs sufficient 
conditions for concluding that a pattern of inferences is applicable. This section 
presents the EBG framework and describes how our generahzation machinery maps 
onto it. Readers unfamiliar with the EBG framework may choose to skip to the 
examples in Section 2.2.3 before reading this section. 

In EBG, the system is given a goal-concept (a description of a class of examples), 
a positive instance of that concept, a domain theory, and an operationality criterion. 
The initial formulation of the goal concept does not satisfy the operationality crite- 
rion. The task is to find a reformulation that does satisfy the operationaHty criterion, 
using the training example and the domain theory. The performance program uses 
the domain theory (a set of inference rules) to prove (explain) that the training ex- 
ample satisfies the given formulation of the goal concept. A generahzation algorithm 
then finds the weakest set of preconditions under which the same proof would apply. 
These preconditions ignore "irrelevant" features, and replace constants with predi- 
cates on variables. If these preconditions satisfy the operationality criterion, they are 
the desired reformulation of the goal concept. 

Because a single explanation of why the instance satisfies the goal concept guides 
the generalization, the reformulation is a specialization of the original goal concept. 
It gives necessary and sufficient conditions for the particular explanation to be appfi- 
cable. If other explanations are possible, however, this reformulation provides only 
sufficient conditions, and not necessary conditions, for recognizing future cases as 



18 CHAPTER 2. SIMILAR = SAME CONTRADICTION DERIVATIONS 

instances of the goal concept. 

2.2.2 EBG on Conflict Set Derivations 

In generalizing from a candidate generation case, there will be one goal concept 
(and hence one generalized rule) that corresponds to each conflict set found in diag- 
nosing the case. 

Goal Concept The sets of observable values for the device from which a given set 
of components can predict contradictory values. For example: "The sets of 
observations for which (Ml M2 Al) is a conflict set." 

Training Example The training example consists of one set of observations for the 
device that is a positive instance of the goal concept. For example, (A=3; B=3; 
C=2; D=2; E=3; F=10; G=12) is a set of observations for which (Ml M2 Al) 
is a conflict set. 

Domain Theory The domain theory consists of the structure of the device and the 
behavior descriptions of the device's components. 

Operationality Criterion Since the purpose of generalizing is to enable the pro- 
gram to make some diagnostic inferences before tracing through the structure 
of a circuit, the operationality criterion requires that predicates be testable on 
sets of observations without propagating values in the circuit. The reformu- 
lated goal concept can thus mention ordy the observables of the circuit, and 
not internal values. 

Proof The dependency trail of the derivation of contradictory values serves as the 
proof that the training example satisfies the goal concept. 

Our algorithm for finding the weakest preconditions replaces the actual obser- 
vations with variables, and works forward through the proof tree, runiung each of 
the behavior rules to predict symbolic values (i.e., expressed in terms of variables).^ 
Behavior rule firings that occur later in the derivation then propagate the symbolic 
values derived. Some restrictions on the symbolic values may be needed to satisfy 
the preconditions of the behavior rules. These restrictions, together with a predi- 
cate which ensures that the two values derived are indeed contradictory, form the 
preconditions of the generalized rule. 

2.2.3 Example: Generalizing the Derivation of 
Conflict Set (Ml M2 Al) 



*Our algorithm bears a close resemblance to that of [DM86], which corrects a technictil error in 
the algorithm presented in [MKKC86]. 
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Figure 2.4: Generalizing the construction of conflict set (Ml M2 Al). Symbolic values 
propagated are in brackets. 

Figure 2.4 illustrates how the program generalizes from the derivation of the 
conflict set (Ml M2 Al). It first substitutes variables A, C, B, D, and F for their re- 
spective values, then reruns the behavior rules. Mi's forward behavior rule predicted 
6 at X; the symbolic value predicted is (* A C), as shown in brackets in Figure 2.4. 
Similarly, M2 predicts (* B D) at Y. Al uses those values to predict (+ (* A C) 
(* B D)) at output F. A contradiction will arise whenever the observed value at F 
differs from the value of this expression. The resulting rule in this case is:^ 

Rl: IF (HOT (= ?F (+ (* ?A ?C) (* ?B ?D)))) 
THEI (COHFLICT-SET '(Ml M2 Al)) 

What generalization can we make of the construction of the second conflict set 
(Ml M3 Al A2), which exonerated M2? A common answer is the rule that "if F or 
G is incorrect but the other is not, then M2 cannot be a suspect."* The intuition is 
that if M2 is broken, there should be incorrect outputs at both F and G, not just at 
one of them. If they are both incorrect, the intuition is that M2 can be a candidate, 
because it contributes to both outputs. 

The program creates a rule that states: 



^Throughout this thesis, symbols preceded by question marks indicate variables which must be 
bound before the rule can be fired. When Rl is checked in a new case, ?F will be bound to the 
observed value at output F, ?A to the observed value at input A, and so on. 

^Stating that (Ml M3 Al A2) is a conflict set is equivalent to stating that M2 is not a suspect. 
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Figure 2.5: Generalizing the construction of conflict set (Ml M3 Al A2) 

R2: IF (HOT (= (- ?F (* ?A ?C)) 

(- ?G (* ?C ?E)))) 
THEN (COHFLICT-SET '(Ml M3 Al A2)) 

This rule applies when only one of the two outputs is incorrect, but it also applies 
in many cases where both outputs are incorrect. For example, R2 applies when the 
observables are (A=3; B=3; C=2; D=2; E=3; F=10; G=8), which is a case not 
covered by the generalization produced by common intuition. Common intuition 
fails because even though M2 can account for the misbehavior at either F or G, it 
would have to be malfunctioning in two different ways, producing the outputs 4 and 
2, in order to explain the misbehavior at both F and G. Thus, the computer-generated 
rule correctly exonerates M2 given these observations, whereas the rule produced by 
common intuition does not. 



2.3 The Augmented Diagnostic Algorithm 

We implemented an augmented diagnostic engine that uses the generalized rules 
created by EBG to improve diagnostic performance on cases that are "similar" to 
cases the program has diagnosed before. The generalized rules enable the program to 
recognize when a pattern of inferences from a previous case can be applied to a new 
case, and to jump to the same conclusion, the identification of a conflict set. The 
diagnostic program starts with a reduced suspect set if the generalized rules identify 
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1. Retrieve from the library for the device the generalized rules for 
noticing conflict sets. Check the applicability of each rule and 
intersect the conflict sets identified to form the initial suspect 
set. 

2. If there are no conflict sets found using generalized rules, propa- 
gate values from inputs to outputs to find an initial contradiction, 
yielding an initial suspect set. 

3. While there are still unexamined suspects: 

(a) Choose an unexamined suspect at random. 

(b) Perform constraint suspension on the suspect. 

(c) If a contradiction is found, form a conflict set, which reduces 
the suspect set. 

(d) If no contradiction is found, add the suspect to the candidate 

set. 

(e) Remove the suspect fcom the unexamined suspect set. 

4. Use EBG to generjJize each derivation of contradictory values 
found by propagating values and add the new rules to the library. 



Figure 2.6: The Augmented Diagnostic Algorithm 



some conflict sets, which reduces the total number of suspects on which constraint 
suspension must be performed. The program then falls back on constraint suspension 
to try to exonerate the remaining suspects. Figure 2.6 gives a more precise description 
of the augmented diagnostic algorithm. 

The augmented diagnostic engine must fall back on constraint suspension after 
using its past experience because it can never be sure if it has a complete set of 
generalized rules. There might be some derivations of contradictory values that the 
program has not yet encountered, in which case the generalized rules might miss 
identifying some conflict sets. In order to guarantee that the augmented diagnostic 
engine exonerates all of the components that it is possible to exonerate, the program 
performs constraint suspension on each of the initial suspects. 

Note that falling back on constraint suspension of the initial suspects places a 
lower bound on how fast the augmented diagnostic engine can run. No matter how 
good the generalized rules are, the augmented diagnostic engine will perform con- 
straint suspension on at least each of the final candidates. The generalized rules can 
only be used to save the cost of performing constraint suspension on some components 
that are eventually exonerated. 
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NOl 



Figure 2.7: The gate-level description of a carry-lookahead adder, adapted from the 
TTL Data Book. 



2A. THE SYSTEM IN ACTION: A CARRY-LOOKAHEAD ADDER 
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Figure 2.8: The derivation and generalization of a contradiction that yields the 
conflict set (A21 N22 A22 All N12 MOl A13 012 N14 X2). Generalized values pre- 
dicted are in brackets, followed by preconditions, if any, for the rule firing. 

2.4 The System in Action: A Carry-Iookahead Adder 

We use the carry-lookahead adder in Figure 2.7 to illustrate the augmented algo- 
rithm in action. As we will see, the program first diagnoses an adder that produces 
the output 10 from inputs 6 and 6. It then constructs two conflict sets from two 
derivations of contradictory values in the circuit. It creates two generalized rules 
from those two derivations, and inserts the two rules into the library. In diagnos- 
ing another adder, which produces the output 17 from inputs 7 and 14, one of the 
rules generated in diagnosing the first case applies, but the other does not, and the 
program falls back on constraint suspension to find an additional conflict set. 

2.4.1 The First Case: 6 + 6 = 10 

The first case presented to the system has inputs 6 and 6 (carry-in CO is 0), 
and output 10. There are no generalized rules in the library yet, so the program 
proceeds to step two of the augmented diagnostic algorithm. The conflict set found in 
Figure 2.8 is (A21 N22 A22 All N12 NOl A13 012 N14 X2). Special-case behavior 
rules for and-gates and or-gates can predict the gate's output from just one input in 
the obvious situations, as for example A22 in Figure 2.8. Using the special-case rule. 
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Figure 2.9: The derivation and generalization of a contradiction that yields the con- 
flict set (A31 N32 A32 X3 N24 021 N21 N22 A24 A23 022 A21). Generalized val- 
ues predicted are in brackets, followed by preconditions, if any, for the rule firing. 



A22's output depends on fewer components, so a smaller conflict set can be created. 

The program performs constraint suspension on X2 in step three, and another 
contradiction is derived (Figure 2.9). The conflict set identified is (A31 N32 A32 
X3 N24 021 N21 N22 A24 A23 022 A21). The diagnostic engine intersects the two 
conflict sets, which reduces the suspect set to (A21 N22). Constraint suspension is 
performed on A21 and N22 in turn, but no further conflict sets are found. 

The program then creates two rules for recognizing the applicability of the two 
derivations of contradictory values. The figures illustrate this generalization process. 
Note that some restrictions on the symbolic values are needed in order to satisfy the 
preconditions of the behavior rules. For example, in Figure 2.8, the behavior rule for 
A13 that produced output 1 required both of its inputs to be 1. The symbolic values 
on Al3's inputs are 1 and (INVERT CO), so when the behavior rule is run during 
the generalization process, output 1 is predicted and the precondition (=1 (INVERT 
?C0) ) is added to the generalized ride's preconditions. The two rules generated are: 
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2.4.2 The Second Case: 7 + 14 = 17 

Imagine that the augmented diagnostic program is later given another copy of the 
adder circuit to diagnose, with the observables A=7, B=14, C0=0, S=17. Here R5 ap- 
plies; R4 does not apply because input Al is 1, not as it requires. Hence, the initial 
suspect set is (A31 N32 A32 X3 N24 021 N21 N22 A24 A23 022 A21). When con- 
straint suspension of A21 is performed, a contradiction is found, yielding the conflict 
set (021 N21 Oil Nil A24 N23 All N12 NOl A13 012 N14 X2 A22 A23 022 N24 
X3 A32 N32 A31). This reduces the suspect set to (A31 N32 A32 X3 N24 021 N21 
A24 A23 022). Since no further contradictions are found, this is the final candidate 
set. 

The final candidate set (A31 K32 A32 X3 N24 021 N21 A24 A23 022) is the 
same one that would have been produced without using the generalized rules. The 
augmented diagnostic program, however, finds one of the conflict sets faster than the 
original diagnostic program would have, because of the applicability of R5, which 
was created during diagnosis of the first case. 

2.4.3 Summary 

The diagnosis of the two cases for the adder circuit illustrates three important 
points about the augmented diagnostic algorithm. First, a generalized rule con- 
structed from a single case was applicable to a second case that bears little surface 
resemblance to the initial case. Second, whUe two rules were generated in diagnosing 
the first case, only one of them was applicable in diagnosing the second. Cases can be 
"similar" according to one generalized rule, but not "similar" according to another. 
This is not surprising since each generalized rule defines similarity by the common 
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applicability of a particular pattern of inferences. Third, the generalized rules al- 
low the program to start with a small initial suspect set, but there might be some 
derivations of contradictory values that the program has not yet encountered, so that 
some conflict sets are not identified using the genereJized rules. Hence, the program 
performs constraint suspension on each initial suspect to try to further reduce the 
suspect set. 



Chapter 3 
Performance Analysis 



The previous chapter illustrated how a diagnostic program can benefit from recog- 
nizing that derivations of conflict sets from earlier examples are applicable to later 
examples. It is important to realize, however, that in order to gain those benefits, 
the program pays the cost of checking all of the generalized rules. Throughout this 
chapter, the term 'utility' refers to the difference between the benefits and the costs. 

There is no a priori reason to expect the benefits of the generalized rules to 
outweigh the costs, or vice- versa. In this chapter, we present experimental results 
measuring the utility of the generalized rules that are constructed during diagnosis 
of the polybox and adder circuits. The experiments measure only the effect of using 
the generalized rules, under the assumption that the cost of creating the generalized 
rules will be amortized over enough cases so as to be negligible. Performance on both 
the polybox and the adder circuit improved using the generalized rules. Hence, at 
least for the selection of cases used in the experiments, the benefits of the generalized 
rules outweighed the costs. 

We also present a breakdown of the operations used in diagnosis, with and without 
generalized rules. This culminates in Section 3.4.3 with an equation for the change 
in performance caused by the generalized rules. In Section 3.7, the equation is used 
to sketch the device characteristics that influence the utility of the generalized rules. 
In Section 3.8, the equation is used to predict how changes to the diagnostic engine 
would affect the utility of the generalized rules. 

3.1 Experiment Description 

We ran experiments to compare the efficiency of the original diagnostic program 
to the efficiency of the augmented diagnostic program on the polybox and adder 
circuits. First, a large space of cases was generated for each circuit. Each experiment 
started with the random selection of a set of training cases and a set of test cases 
from that space. Then, statistics were gathered for each of the following runs: 

27 
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1. Each training case is diagnosed using the original diagnostic program (i.e. with- 
out generalized rules.) This establishes a baseline for comparison. 

2. Each training case is diagnosed again. This time, the program creates a new 
generalized rule each time it finds a new way of deriving contradictory values. 
Thus, in diagnosing the tenth case, the program used the generalized rules it 
created during diagnosis of the previous nine. The results from the second run 
are compared with the results from the first run in order to measure the effect 
of "continuous" learning. 

3. The third run establishes a baseline for the test cases. Each test case is diag- 
nosed without creating or using any generalized rules. 

4. Each test case is then diagnosed again, using the library of generalized rules 
created in the second run. No new generalized rules are constructed. The results 
from the third and fourth runs are compared in order to measure performance 
"after" learning. 

Note that both the second and fourth runs provide a way of measuring the transfer 
of applicability of the generalized rules to new cases. For both the polybox and 
adder circuits, performance improved during continuous learning and was better after 
learning than before. 

3.2 Performance Results 

3.2.1 Polybox Experiments 
Cases Generated 

In order to generate cases (sets of observations) for the polybox circuit, we se- 
lected 79 sets of inputs at random.^ Then, for each set of inputs we generated the 
outputs that would be produced by each of of the possible stuck-at faults for the 
components. A stuck-at high at a component's port models a wire as always car- 
rying the value 1, even when it should be carrying the value 0. This type of error 
occurs frequently because the connections between pins and the internals of a chip 
come loose. Similarly, stuck-at low indicates that the wire always carries the value 
0. Each of the 148 possible stuck-at faults on the inputs or outputs of the circuit 
components were considered. For example, multiplier Ml has two three-bit inputs 
and a six- bit output. Each of these can be stuck either high or low, so there are a 
total of 24 possible stuck-at faults for Ml. Similarly, adder Al has two six-bit inputs 
and a seven-bit output, each of which can be stuck either high or low, for a total 

^There is no significance to this number. There were 2^^ possible sets of inputs. 79 sets of inputs 
yielded 4376 input/output combinations, which seemed like enough. 
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of 38 possible stuck-at faults. The circuit was simulated using each possible fault 
model (assuming that the rest of the components were working properly) to predict 
the output values. 

Results 

We randomly selected 100 training cases and 100 test cases from among those 
generated. The average time taken to diagnose a training case without using any 
generalized rules was 0.77 seconds. The average time to diagnose a training case 
dipped to 0.56 seconds when generalized rules from previous cases were used. Only 
three generalized rules were constructed during this second run, because there are 
only three ways to derive conflict sets in the polybox circuit. The average time to 
diagnose a test case without using any generalized rules was .76 seconds. That dipped 
to .55 seconds when using the three generalized rules created from the training cases. 
Thus, the three generalized rules improved performance on the polybox circuit by 
28%. 



3.2.2 Adder Experiments 
Cases Generated 

The space of adder cases was generated by considering all of the possible in- 
put/output combinations in which the carry-in bit CO was 0.^ These were then 
filtered to keep only those cases that admitted single-fault candidates. A total of 
1092 cases passed the filter. 

Results 

Performance improved in diagnosing the adder circuit as well. We randomly 
selected 150 training cases and 150 test cases from among those generated. The 
average time taken to diagnose a training case without using any generalized rules 
was 7.09 seconds. The average time to diagnose a training case dipped to 6.23 seconds 
when generalized rules from previous cases were used. A total of 221 generalized rules 
were constructed during that run. The average time to diagnose a test case without 
using any generalized rules was 7.07 seconds. That dipped to 5.66 seconds when 
using the 221 generalized rules created from the training cases. On average, 3.15 
generalized rules applied to each test case and 0.56 additional conflict sets were found 
using constraint suspension. Of the 5.66 seconds, 0.70 seconds was spent checking 

^The restriction that CO be is physically plausible because the carry-in bit is not needed in some 
uses of an adder, in which case the pin is tied to ground. For these experiments, the restriction is 
more pragmatic than principled: it took several days to generate the cases even with the restriction. 
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the generalized rules. Overall, the generalized rules improved performance on the 
adder circuit by 20%. 



3.3 Case Selection for Experiments 

Unfortunately, the correct way to benchmark a performance learning system is 
still an open question. One difficult issue is the distribution of training and test 
cases. We sampled cases uniformly from among those generated, but any practical 
circuit would fail in some ways more frequently than in other ways. By paying more 
careful attention to how we generated cases (e.g. implementing the circuits with 
TTL chips and then simulating the typical ways that TTL chips fail) it would have 
been possible to produce a distribution of cases that was arguably more realistic. 
We do not claim to have chosen the "correct" distribution of examples. Hence, the 
experimental results oidy serve to illustrate that using EBG to generalize derivations 
of conflict sets can improve performance and to motivate the analysis of the factors 
affecting whether it will do so. 

Another issue is how robust the results are. What if smaller or larger training 
and test sets were selected? What if different random samples of the same size were 
selected? Appendix B reports experiments showing that, with either of those changes 
to the training and test sets, performance still improves. 

3.4 Cost Breakdown 

This section highlights the costs of the different operations involved in diagnosis, 
both with and without the learning component. The restdtant cost formula is then 
used to identify the sources of the speedup in the experiments. Section 3.7 uses the 
formula to give qualitative characteristics of the circuits for which EBG will improve 
performance and Section 3.8 uses the cost breakdown to argue that changes in the 
relative efficiency of the operations that are used by the original diagnostic program 
can alter the direction and the magnitude of the performance change resulting from 
using EBG. 

3.4.1 Without Generalized Rules 

The costs of running the original diagnostic program can be split into three cate- 
gories. First there are the costs of propagating values. Second, there are the costs of 
switching assumptions about which components are working, which happens during 
constraint suspension. Finally, there are the costs of constructing conflict sets when 
contradictory values are found. This section analyzes these three costs in turn. 
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1. Assume all components are working. Propagate values from in- 
puts to outputs to find an initial contradiction, yielding an initial 
suspect set. 

2. While there are still unexamined suspects: 

(a) Choose an unexamined suspect at random. 

(b) Perform constraint suspension on the suspect. 

(c) If a contradiction is found, form conflict set, which reduces 
suspect set. 

(d) If no contradiction is found, add suspect to candidate set. 

(e) Remove suspect from unexamined suspect set. 



Figure 3.1: The Original Diagnostic Algorithm 

Value Propagation Costs 

Value propagation costs fall into three categories: 

• Binding the vaxiables on the left-hand sides of behavior rules. 

• Computing the conclusions of the rules, given the variable bindings. 

• Recording the conclusion and maintaining dependency information. 

A constraint network [SteSO] implements the propagation of values through the 
circuit. At each node of the circuit, the program keeps a list of behavior rules that 
can propagate a value from that node. When a new value is asserted at a node, 
each of the associated behavior rules checks if it is ready to fire (a multiplier rule, 
for example, is not ready until both of its inputs are asserted). This method is more 
efficient than pattern-directed rule invocation because the behavior rules do not need 
to search the entire database of assertions to find possible variable bindings.^ AU of 
the behavior rules require binding of approximately the same number of variables, 
so we approximate the variable binding costs for a rule firing as a constant, ki. We 
count B, the number of behavior rule firings. 

There is also a cost to computing the conclusions of the behavior rules, given 
variable bindings. For example, given the values at its inputs, a behavior rule for 

^For historical reasons, our implementation is not quite this clean. Pattern-directed invocation is 
used for component rules, while the constraint network is used for wires and for detecting contradic- 
tions. As discussed in Section 3.8.1, the main ineiRciencies in using pattern-directed rule invocation 
occur in propagating values through wires and detecting contradictions. Thus, our implementation 
eliminates the major sources of inefficiency in pattern-directed rule invocation. 
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Figure 3.2: Polybox 

Ml multiplies them. All of the component behavior rules for our circuits compute 
either single arithmetic or logic operations, which we estimate as a constant cost, ^3. 
Single arithmetic and logic operations take a negligible amount of time, relative to 
the other costs incurred, so ^2 will be negligible. 

Finally, there are costs to asserting a new value. The program records dependen- 
cies so that it can find the components which supported derivations of contradictory 
values and also so that it can efficiently change its assumptions about which compo- 
nents are working. Each time a behavior rule is run, a justification is recorded stating 
that the antecedents of the rule support the conclusion. For example, if Mi's forward 
behavior rule predicts the value 6 at X from 3 at A and 2 at C, the justification states 
that X is 6 as long as A is 3, C is 2, and Ml is assumed to be working. The con- 
clusion is then placed in the database. In addition, each antecedent assertion must 
record that it is in a new justification. We make the simplifying assumption that 
all justifications involve the same number of assertions, so that the cost of recording 
a justification is a constant, ^3. One justification is installed for each behavior rule 
firing, so the cost of asserting a new value is kz * B. 

Crule-bunning = (^1 + k2 + ks)* B 



Context Switching Costs 

When the diagnostic engine performs constraint suspension on a component, it 
must suspend the assumption that the component is working. If the component 
has been used to predict values, those value assertions must also be removed. For 
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example, if constraint suspension is performed on Ml, the justification for asserting 
6 at X is no longer valid. If that assertion is not supported by any other justification, 
the assertion must be removed. Removing that assertion may require the suspension 
of still further justifications and the removal of other assertions. 

If constraint suspension is performed on M2 at some later time, the assumption 
that Ml is working will be reactivated. Our truth-maintenance system (a JTMS) 
caches (rather than erases) justifications that are no longer valid so that it can avoid 
re-running behavior rules. In this case, when the assumption that M2 is working 
is reactivated, the justification for 6 at X becomes valid again, and the assertion is 
brought back into the database, without re-running Mi's behavior rule. 

Each addition or removal of an assertion is caused by the activation or deactivation 
of a justification. The combination of the activation of one justification and the 
addition of one assertion, or the deactivation of one justification and the removal of 
one assertion, takes approximately constant time, ^4. Thus, the costs of switching 
contexts (sets of assumptions about which components are working) can be measured 
by counting TV (for truth- value changes), the number of assertions brought "in" to 
the database or removed from the database. 

CcONTEXT-SWITCH — ki* TV 

Conflict Set Construction Costs 

Once the diagnostic engine predicts contradictory values, it finds the components 
used in the derivation and notifies the system that there is a new conflict set. To find 
the components supporting the assertions of contradictory values, the program simply 
traces back through the justifications that the two contradictory assertions depend on. 
We approximate the cost of finding the components supporting a contradiction as a 
constant, k^. Each conflict set must be recorded and intersected with the suspect set, 
incurring another constant cost, fcg. We count the number of conflict sets constructed, 
N. 

CcONFLICT-SET = (^5 + k^) * N 

Cost Formula 

The following formula summarizes the breakdown of costs incurred in first-principles 
diagnosis: 



C = GpROPAGATION + 

CcONTEXT-SWITCH + 
^CONFLICT- SET 
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1. Retrieve from the library for the device the generalized rules for 
noticing conflict sets. Check the applicability of each rule and 
intersect the conflict sets identified to form the initial suspect 
set. 

2. If there are no conflict sets found using generalized rules, propa- 
gate values from inputs to outputs to find an initial contradiction, 
yielding an initial suspect set. 

3. While there are still unexamined suspects: 

(a) Choose an unexamined suspect at random. 

(b) Perform constraint suspension on the suspect. 

(c) H a contradiction is found, form a conflict set, which reduces 
the suspect set. 

(d) If no contradiction is found, add the suspect to the candidate 
set. 

(e) Remove the suspect from the unexamined suspect set. 

4. Use EBG to generalize each derivation of contradictory values 
found by propagating values and add the new rules to the library. 



Figure 3.3: The Augmented Diagnostic Algorithm 

= {ki + k2 + k3)*B + 
ki*TV + 
(ks + ke)*N 



3.4.2 With Generalized Rules 

The augmented diagnostic algorithm uses all of the operations the original al- 
gorithm uses, but it also checks the applicability of generalized rules. Checking a 
generalized rule requires binding variables and checking preconditions, which we ap- 
proximate as a constant cost operation. We count G', the number of generalized rules 
checked. When a generalized rule is applicable, a new conflict set is recorded, which 
incurs cost ^5. We count A', the number of conflict sets found using generalized rules. 

Cost Formula 

The following formula summarizes the breakdown of costs incurred in diagnosis 
aided by generalized rules: 
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r" — r" I 

^ — ^GENERALIZED-BULE-CHECKING + 
^PROPAGATION + 
^CONTEXT-SWITCH + 
^CONFLICT- SET 

= k7*G' + k5*{A' + N') + 

{ki + k2 + Ass) *B' + k^* TV + 
ke*N' 

3.4.3 Cost Differential 

The utility of generalizing derivations of conflict sets can be calculated as the 
difference in cost between diagnosis without using the generalized rules and diagnosis 
with the generalized rules. If the difference is positive, the generalized rules have 
improved the performance of the system. 

n _ r" _ _rt' I 

^ ^^ — '-'GENERALIZED-RULE-CHECKING ' 

{C PROPAGATION - ^propagation) + 
{CcONTEXT-SWITCH — G'coNTEXT-SWITCh) + 

{Cconflict-set — Cconflict-set) 

= -kj *G' + 

(h + k2 + ks) *{B- B') + 
ki * {TV - TV) + 
h*{N-N'- A') + ke*{N- N') 

3.5 Breakdown of Results 

It is clear from the last two columns of Figures 3.4 and 3.5 that the generalized 
rules improve performance because they reduce the number of behavior rule firings 
and context switches required during diagnosis. The reason is that, using the single- 
fault assumption, the program forms the initial suspect set by intersecting all of the 
conflict sets that the generalized rules identify. After that, the constraints imposed by 
some suspect will always be suspended, so it never performs the value propagations 
that require all of the initial suspects to be working. Without the generalized rules, on 
the other hand, some values are propagated assuming that all of those components 
are working, in order to identify the conflict sets. In Section 4.2.2 we will return 
to the key role played by the single-fault assumption, and argue that EBG cannot 
significantly significantly reduce the number of value propagations performed by a 
multiple-fault diagnostic engine such as GDE[dKW87]. 
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Figure 3.4: Results from Polybox Experiments. All numbers are averages over the 
100 cases run. 
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Figure 3.5: Results from Adder Experiments. All numbers are averages over 150 
cases. 

3.6 Utility of Individual Generalized Rules 



The equation makes it clear that the utility of an individual generaUzed rule in 
speeding up diagnosis depends on how frequently the rule applies, how expensive 
it is to check, and how much benefit is gained from it when it is applicable. The 
importance of a rule appl3ang frequently is obvious, since all the terms except —kj*G' 
drop out of the equation when the rule is not applicable: no benefit is gained from the 
rule when it is not applicable. One term in the cost differential formula is —kj * G', 
which makes it clear that if checking a generalized rule is very expensive (i.e. kj is 
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1 £. r 1 • propagation time saved per case 

benent of applyine once = — *- — i f f — i — i » — irr- 

'^^ •' ° number of rules applicable per case 

7.07-(5.66-.70) cT J 

= ^^^g '■ — .DYseconds 

time to check all rules per case 



. r 1 1 . time to cnecK all rules p< 

cost of checking once = r j — i V — i — -f- 

° number of rules checked per case 

= m = .0032seconds 

benefit/cost ratio = -^^ = 211 



Figure 3.6: Derivation of benefit-cost ratio of a generalized rule based on results from 
the adder experiment. 

high,) the utility of the rule may be low or even negative. Finally, the rule will have 
greater utility the more it reduces the number of behavior rules fired (B - B') and 
the context switching costs (TV - TV). 

On average, the benefits gained from the appHcability of a generalized rule to an 
adder case equaled the cost of checking a rule 211 times. That figure is derived in 
Figure 3.6. This means that generalized rules that, on the average, were applicable 
less than once every 211 examples slowed the system down. Moreover, if checking 
a generalized rule had been four times as expensive, or if there had been roughly 
four times as many, and all other factors remained constant, using the generalized 
rules would have slowed the system down overall. This analysis emphasizes that the 
benefits will not always outweigh the costs: the performance effect of the generalized 
rules is an experimental question. 

3.7 Aggregate Analysis: Device Characteristics 

The differential cost formula of Section 3.4.3 is also helpful in analyzing the effect 
of aU of the generalized rules taken together. This leads to a characterization of the 
kinds of devices for which generalizing conflict set derivations wiU improve diagnostic 
performance. The devices that will gain the most are those in which only a few 
components ever fail, the behavior of components is inexpensive to compute, and 
the topology of the device is such that conflict sets tend to have few components in 
common. 

The fewer different components of the device actually fail, the more effective this 
use of EBG will be. This is true because if only a few components of the device 
ever break, only a few patterns of value propagations will lead to predictions of 
contradictory values. Hence, only a few generalized rules will be constructed, and 
the term —kr*G will be small. 

The simpler the component behaviors, the higher the utility of the generalized 
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rules. Each generalized rule encapsulates a pattern of inferences, and checking the 
preconditions of a rule incurs all of the behavior costs of the rule firings encapsu- 
lated. For example, checking the precondition (= ?F (+ (* ?A ?C) (* ?B ?D))) 
in Rl involves the same two multiplications and one addition that would have been 
performed by the behavior rules for Ml, M2 and Al. The arithmetic and logic 
operations performed by the components in the two circuits considered here are in- 
expensive, so the cost of checking the generalized rules is not prohibitive. Suppose 
the components computed square roots instead. Both fcy and k^ would be higher, so 
at first glance it is not clear what the effect on the utility of the generalized rules 
would be. The key point is that Mi's behavior may be encapsulated in more than 
one generalized rule. Thus, checking the generalized rules would incur more square 
root computations than aie saved by reducing the number of behavior rule firings (B 
- B'). Hence, the more expensive the component operations, the lower the utility of 
the generalized rules. 

If the topology of the device is such that conflict sets tend to have few components 
in common, the utility of the generalized rules will be higher. This follows from the 
fact that the benefits gained from the applicability of a generalized rule depend on the 
number of suspects that it eliminates. The more suspects are eliminated, the greater 
the number of behavior rule firings (B - B') and truth-value changes (TV - TV) that 
are saved. If every conflict set includes several components that are not in any other 
conflict set, then each generalized rule, when it is applicable, will exonerate several 
suspects. On the other hajid, if the device has n components and every conflict set 
has n — 1 components, each generalized rule that applies can only reduce the suspect 
set by one component. EBG will be most effective when the topology of the device 
is such that conflict sets have few components in common. 



3.8 Aggregate Analysis: Alternative Diagnostic Engines 

The differential cost formula of Section 3.4.3 also makes it clear that using EBG 
to generalize conflict set derivations may either improve or reduce diagnostic speed 
depending on the performance of the original diagnostic engine, including the relative 
costs of behavior rule firing, TMS operations, and checking generalized rules. For 
example, if checking a generalized rule is orders of magnitude more expensive than 
firing a behavior rule or performing a TMS operation, the generalized rules will cause 
a significant deterioration in performance. This section describes four changes to the 
implementation of the diagnostic program and discusses the effect each would have 
on the utility of this use of EBG. The overall theme is that reducing the cost of rule 
firing and context switching reduces the benefits, while reducing the cost of checking 
generalized rules increases the ability of EBG to improve diagnostic performance. 
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3.8.1 Pattern-Directed Rule Invocation 

If our diagnostic engine used pattern-directed rule invocation, EBG would improve 
performance even more than in the experiments. The reason is that pattern- directed 
rule invocation is less efficient than a constraint network for propagating values in 
a circuit, so using pattern-directed rule invocation would increase the size of ki, 
the cost of matching the preconditions of a behavior rule. An earlier version of 
our diagnostic program in fact did use pattern-directed rule invocation to trigger 
behavior rules. Using that version, the total time saved using the generalized rules was 
greater than the time saved using the generalized rules with the constraint network 
implementation. The experiments are reported in Appendix B. 

To understand why pattern-directed rule invocation is less efficient than a con- 
straint network for value-propagation in circuits, consider the wire rule below. The 
assertion of a new value anywhere in the circuit causes the pattern-matcher to attempt 
to match the new value with every wiring assertion in the database. The constraint 
network, on the other hand, can find by one-step lookup the wires connected to the 
location at which the new value is asserted. 

(defrul* WIRE-EQUALITY-FORWARD 

IF (and [wire ?terminall ?objectl ?terminal2 ?object2] 
[value-of Tterminall ?obj©ctl ?value] ) 
THEN 

(assert [value-of ?terminal2 ?object2 Tvalue])) 

Similarly, a pattern-directed rule that detects when two contradictory values are 
asserted at the same location will check each new value asserted against every value 
asserted anywhere in the circuit. The constraint network, on the other hand checks 
only those values asserted at the appropriate location. 

3.8.2 Using an ATMS 

Using an ATMS [dK86] rather than our JTMS would eHminate the context switch- 
ing time during diagnosis, at the expense of increasing the time to run rules and record 
conflict sets. As a result, it is not clear whether generalizing conflict set derivations 
would have more or less utility in speeding up an ATMS implementation of the di- 
agnostic engine. Instead of "inning" and "outing" assertions from a database when 
different contexts are considered, the ATMS keeps track of the minimal contexts in 
which any assertion can be supported. If a predicted value can be supported in two 
ways, two minimal contexts are recorded for it. When a conflict set is found, rather 
than retracting the assumption that some component is working, the ATMS leaves 
the database intact, but refuses to run any more behavior rules in contexts that 
include all of the components of the conflict set. 

We implemented a version of the ATMS candidate generator in GDE [dKW87], 
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modified to take advantage of the single fault assumption. The modified version 
refuses to run behavior rules in contexts containing all of the components in the 
suspect set (the intersection of the conflict sets found so far). Using the ATMS 
eliminates context switching costs, but the cost of running a rule is higher than with 
a JTMS because the context label of the conclusion of the rule must be updated. 
Recording the labels also incurs a space cost: in diagnosing the lookahead adder 
circuit, more than 3600 different labels had to be stored. In addition, recording a 
conflict set is more expensive because all of the context labels containing the conflict 
set must be updated. Overall, EBG improved performance in diagnosis of the polybox 
circuit, but we were not able to evaluate how effective EBG is in speeding up diagnosis 
of the adder circuit because the diagnostic engine was too slow to permit experiments 
in the time available. 

3.8.3 Keeping Dependencies Only 

An alternative diagnostic engine might rerun behavior rules rather than use a 
TMS to cache deductions. This might reduce the cost of context switching, and 
hence reduce the benefits of using EBG. Our implementation uses a JTMS both 
to keep track of dependencies during value propagation and to cache behavior rule 
firings so that they do not need to be re-run when the diagnostic engine changes its 
assumptions about which components are working. However, running behavior rules 
is not very expensive for circuits such as polybox and the lookahead adder, because the 
constraint network makes triggering a behavior rule cheap, and evaluating the body 
of a behavior rule involves only a single arithmetic or logic operation. An alternative 
diagnostic engine might simply clear all of the values asserted in the circuit and 
propagate anew when it performs constraint suspension on a new suspect. That 
might be faster than using a JTMS to remove assertions from the database and add 
others in. If context switching in the alternative diagnostic engine were faster than 
in our diagnostic engine, the benefits from using EBG would be reduced. 

3.8.4 Reducing Costs of Checking Generalized Rules 

The utility of EBG would be higher if the generalized rules could be checked less 
expensively. The time necessary to check generalized rules can be reduced by sharing 
some computation, as described below. 

Sharing variable bindings can reduce the cost of checking generalized rules. All 
of the generalized rules use only the observed values of the inputs and outputs of 
the circuit. Some of the inputs and outputs are used in checking more than one 
generalized rule. If the variables representing the input and output observations 
were bound once and then used in checking all of the generalized rules, rather than 
binding variables separately when checking each generalized rule, the time necessary 
to check all of the generalized rules would decrease. 
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Some computations are repeated in more than one generalized rule, and those 
computations could also be shared. For example, the expression (* ?A ?C) appears 
in both Rl and R2. Since multiplication is very inexpensive, it probably would not 
pay to store the product and use it to check both Rl and R2. If more expensive 
operators appeared in the rules' preconditions, however, which would occur if the 
circuit components' behavior were more expensive to compute, sharing the compu- 
tation of common subexpressions might reduce the cost of checking the generalized 
rules. This is analogous to the Rete Net idea of sharing the evaluation of predicates 
that appear in multiple rules. Here, however, what would be shared is the evaluation 
of subexpressions that are part of different predicates. 

3.9 Trading Off Precision for Speed 

One direction for future research is to explore the effects of trading off precision for 
speed. The augmented diagnostic system is guaranteed to produce the same candi- 
dates that the original diagnostic engine would produce. The generalized rules never 
construct incorrect conflict sets, so no components are mistakenly exonerated. The 
system performs constraint suspension on any components that are not exonerated 
by generalized rules, so every component that can be exonerated is. As the program 
accumulates experience, more suspects are exonerated by generalized rules and hence 
fewer are exonerated by constraint suspension. At the risk of missing the exoneration 
of a few components, an optimistic augmented algorithna could skip the constraint 
suspension step once it had accumulated a large library and deem any component 
in the initial suspect set a valid candidate. The proposed candidate generator would 
produce correct diagnoses, but perhaps less precise ones than are possible. 

Table 3.1 summarizes the speed gained and the number of extra candidates gener- 
ated using generalizations from more and more training examples. The more training 
examples were used, the more precise the final candidate sets were. More results can 
be found in Appendix B. Of course, the effects of trading off precision for speed 
depend on the larger diagnostic context. Can the extra hypotheses be eliminated 
easily at the next stage, or do they prove very costly? That question is beyond the 
scope of this research. 

3.10 Conclusion 

Our experimental results demonstrate that EBG can improve diagnostic perfor- 
mance on both the polybox circuit and the adder circuit. The differential cost for- 
mula of Section 3.4.3 makes it clear that the change in speed resulting from this use 
of EBG depends on characteristics of the circuit and on the relative costs of rule 
running, TMS operations, and checking generalized rules. If the device has only a 
few failure modes, has component behaviors that are inexpensive to compute, and 
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Table 3.1: Speed vs. precision tradeoff in diagnosis 



a topology such that the conflict sets have only a small overlap, the use of EBG 
may speed up the system significantly. Reducing the cost of running behavior rules 
and switching contexts for the original diagnostic engine will reduce the utility of 
using EBG to generalize conflict set derivations, while reducing the cost of checking 
generalized rules will increase the utility. 



Chapter 4 

The Sources of Power in EBG 



The learning program that was analyzed in the previous chapters used EBG to con- 
struct generalized rules that could recognize when a previous derivation of a conflict 
set was applicable to a new set of observations. In later chapters, EBG wiU be used 
to generalize based on other kinds of similarities. In this chapter we step back from 
the particular uses to examine EBG as a tool. 

We analyze in turn the two most common uses of EBG, generalizing successful 
problem solving episodes and generalizing explanations of failures. For each tech- 
nique, the sources of power of EBG are analyzed. One conclusion of the analysis is 
that no "operationality criterion" can guarantee that all of the generalized rules that 
are constructed will have a beneficial effect on problem solving speed. The analysis of 
each technique culminates in a qualitative characterization of the problems for which 
EBG is likely to improve performance. 

The analysis in this chapter is motivated both by the analysis of the experimental 
results in the previous chapter and by previous research that demonstrates that EBG 
will sometimes but not always improve performance [Min85, Min88, TN88]. Previous 
research has tried to understand the effect of EBG on performance by analyzing 
the utility of individual generalized rules [MCE+87, Min88, TN88]. By contrast, 
we believe there are characteristics of problem formulations, problem solvers, and 
distributions of examples that will affect the utility of all of the generalized rules. 
We try to expose them by analyzing the effects of EBG in terms of changes to a 
problem solver's search strategy. 

Throughout the chapter, problem solving is formulated as search. The problem 
solver starts with an initial state and a set of operators that create new states (or 
search nodes) from ones that have been constructed already. The operators have pre- 
conditions that indicate which states they can be applied to. Normally, the problem 
solver's task is to find a single state that satisfies the given goal criteria; alternatively, 
the task may be to find all of the states in a finite search space that satisfy the goal. 

As in the previous chapter, we ignore the cost of creating generalized rules and 
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focus only on the cost of checking them, because we assume that the problem solver 
can amortize the cost of generating a rule over a large number of cases. 

4.1 Generalizing Successful Search Paths 

This section analyzes the most popular use of explanation-based generalization, 
generalizing successful search paths. The problem solver remembers and encapsulates 
a sequence of operator applications that led to a goal state in solving one problem. In 
solving each future problem, it checks whether that operator sequence is applicable 
(the preconditions of all the operators are satisfied) and leads to a goal state. If no 
remembered operator sequence is useful, it falls back on its original techniques for 
searching the space. STRIPS' construction of macro-operators [FHN72] and SOAR's 
chunking [LNR87] are two other mechanisms for encapsulating successful sequences. 
While they use different mechanisms, the resulting generalizations are the same as 
those that would be constructed using EBG [RL86]. 

There are two sources of power in using EBG on successful search paths. The first 
source of power results from remembering useful patterns of inferences, so that search 
in future problems is biased towards paths that have been useful previously. This 
source of power rests on two assumptions: first, some patterns of operator applications 
are useful more frequently than others; second, the distribution of future cases that 
the problem solver will be presented with is reflected by the distribution of cases it 
has seen so far. The second source of power results from encapsulating the search 
paths, so that the problem solver can jump to the conclusion of a remembered pattern 
of inferences without constructing any of the intermediate search nodes. Briefly put, 
using EBG on successful search paths allows the problem solver to find good search 
paths quickly (search bias) and to travel down those paths quickly (encapsulation). 

4.1.1 Biasing Search Toward Previously Successful Paths 

The first potential for speedup from remembering successful search paths in the 
form of generalized rules is that the rules can be used to bias search towards paths 
that were successful before, thus reducing search. That bias will be more effective 
in improving performance the more paths the original problem solver explores that 
newer lead to a solution. In fact, performance may actually deteriorate if too many 
search paths lead to a solution, even if most of them lead to a solution only rarely. At 
the end of the section we propose propose additional mechanisms to use with EBG 
so that the search bias can improve performance when there are many paths that 
rarely lead to a solution, but few that never lead to a solution. 

Each time the problem solver checks the applicability of a generalized rule, it is 
as if the problem solver were exploring the search path which the rule generalizes. 
Thus, EBG alone biases search towards paths that led to solutions before and away 
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from those paths that have never led to a solution. 

To illustrate the weakness of using EBG alone to bias the search, suppose that, 
due to some manufacturing defect, component Ml causes 99% of the failures in the 
polybox circuit and component M3 accounts for 1% of them. The troubleshooter is 
presented with 100 cases, 99 that resulted from Ml failing and one that resulted from 
M3 failing. It generates rule Rl for recognizing that the derivation of the conflict set 
(Ml M2 Al) is applicable and uses Rl 98 times. It generates rule R3 for recognizing 
that the derivation of the conflict set (M2 M3 A2) is applicable but R3 never applies 
again. Yet after solving those 100 "training instances," Rl and R3 have equal status. 
In diagnosing future cases, Rl will be useful very frequently, while checking R3 will 
be a waste of time in 99 cases out of 100. 

More generally, EBG will bias search most effectively when many of the possible 
search paths never lead to a solution. Remembering and using every successful pat- 
tern of inferences amounts to representing recurrence by a single bit: "Have I seen a 
problem like this at least once before." Search in solving future problems is biased 
toward paths that have been useful before and away from paths that have never 
been useful before. In the worst case, when every possible search path has led to a 
solution at least once before, all search paths wiU have equal status, and the search 
degenerates into a blind generate and test. If the original problem solver was able 
to do better than a blind search, use of EBG in this worst-case scenario could slow 
down the problem solver considerably. In short, EBG will bias search most effectively 
when there is a bimodality in the frequency with which search paths leads to goal 
states: every search path should lead to a goal state either frequently or not at all. 

One way that search could be biased more effectively is for the program to keep 
some information about how frequently generalized rules are applicable. Several 
strategies are possible, of which we outline two. First, the program could keep track 
of the exact frequency of applicability of each generalized rule. Using the frequency 
statistics it could order the checking of rules so that rules that were useful more 
frequently would be checked first. It could also "forget" rules that were useful too 
infrequently. Second, instead of keeping explicit statistics, it could keep a fixed 
number of generalized rules in a Least Recently Used queue, bringing a rule to the 
front of the queue when it is used and throwing out the least recently used rule when 
the queue is full. Section 4.4.3 describes more elaborate statistical mechanisms, 
actually implemented in [Min88], that take into account the cost of checking a rule 
and the benefits derived from its being useful, as well as how frequently it is useful. 

One possible improvement to the use of EBG that does not involve additional 
statistical mechanism is to make judicious choices as to which successful solution 
paths should be packaged up into generalized rules. Bottom-up chunking [Ros83, 
RN86] is one method for making those choices. Bottom-up chunking assumes that 
there is a hierarchy of problem spaces. That is, the problem solver can set up new 
search spaces to solve sub-problems while it is working on a larger problem. The 
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bottom-up chunking method then chooses to encapsulate a successful search path 
only if the problem solver did not need to create and solve any sub-problems while 
taking that search path. Thus, the first time the problem solver solves a difficult 
problem, it will create generalized rules (chunks) from the solution of the lowest-level 
subproblems. If it solves the same problem again, those generalized rules will allow 
the problem solver to avoid creating the lowest-level subproblems, and it will create 
generalized rules for the next level of problem solutions. In this way, the choice of 
which generalized rules to create from the solution of a particular problem depends 
not only on the current problem but on all of the previous problems presented to 
the problem solver. Chunking in SOAR, however, no longer uses the bottom-up 
approach. 

Summary 

The straightforward use of EBG to remember every successful search path will 
bias search to the extent that many paths never lead to solutions. Because EBG 
considers each case in isolation, it fails to distinguish patterns that were frequently 
useful from those that were rarely (though sometimes) useful. One implication of this 
is that there is no way of filtering out "non-operational" generalized rules as they 
are created: the utility of a generalized rule depends on characteristics of the whole 
distribution of examples, not just the characteristics of any one example. If there 
are many paths that lead to solutions rarely (but sometimes), EBG may need to be 
embedded in a learning system that pays attention to the whole distribution of cases. 

4.1.2 Encapsulating Patterns of Operator Applications 

A second potential source of speedup from generalizing successful search paths 
is that EBG encapsulates a pattern of operator applications. That is, it remembers 
only the weakest preconditions for a search path and its conclusion rather than the 
whole path. The encapsulation makes it possible to check whether a whole sequence 
of operators can be applied and will lead to a goal state, and then jump to that goal 
state, all without actually appl3nng any of the operators. 

Two factors can make checking the weakest preconditions and then jumping to 
the final state more efficient than simply re-running all the operators. One factor 
can make it less efficient. Some operators compute their conclusions based on the 
variable bindings for their left-hand sides (e.g.. Mi's behavior rule performs one mul- 
tiplication to compute its conclusion). We call that computation the behavior costs 
of the operator.^ As we will see, all of the behavior costs of an operator sequence 

^In applications involving only logical inference from boolean assertions (e.g. Winston's cup 
example [WBKL83]), the behavior cost of every operator is zero, because operators' right-hand 
sides are not functions of the variables on their left-hand sides. 
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are paid when checking the weakest preconditions. On the other hand, the overhead 
cost of matching the operators' left-hand sides and of recording their conclusions 
are eliminated when checking the weakest preconditions. The first positive factor, 
then, is the savings in overhead: the cost of matching the operators' preconditions 
and constructing intermediate search states. A second positive factor is that the 
weakest preconditions may be simpKfied, saving some of the behavior costs that are 
encapsulated in those preconditions. On the negative side, checking several general- 
ized rules may repeatedly incur the same behavior cost that would have been shared 
among several search paths, had they not been encapsulated. Overall, the smaller the 
cost of evaluating the bodies of operators, relative to the cost of matching operator 
preconditions and storing results, the greater the benefits from encapsulation will be. 

Saving Overhead 

Generalized rules encapstdate the firing of several behavior rules, thereby saving 
the overhead of running them. Given specific values for A, B, C, and D, consider 
the difference between evaluating the expression (+ (* A C) (* B D)), from rule 
Rl, and propagating the inputs through Ml, M2 and Al of the polybox circuit. In 
either case, two multiplications and one addition are performed. However, there is 
some overhead cost associated with propagating the values through the components. 
One source of overhead is rule triggering: each operator's preconditions are checked 
and its variables are bound. A second source of overhead is recording the conclusion 
of the operator (e.g., that the output of M2 is 6). Depending on how high these 
overhead costs are, and how long the encapsulated operator sequences are, saving 
the overhead costs of the encapsulated behavior rule firings may be significant. 



Expression Simplification 

A second potential source of efficiency is simplifying a generalized rule's pre- 
conditions, thus saving some of the behavior costs as well as the overhead costs of 
re- running the operator sequence. Consider the left-hand side of R5, from the adder 
circuit. S3 was propagated back through X3, assuming X3's other input was 0, and 
then through N24, yielding the expression (INVERT (XOR ?S3 0)) which appears 
in the precondition (NOT (= (INVERT (XOR ?S3 0)) 0) ); evaluating the equivalent 
precondition (NOT (= ?S3 1)) would save the behavior costs of firing X3's and N24's 
behavior rules. The implemented program does not perform any such behavior sim- 
plifications, partly because the behavior costs for adders, multipliers, and gates are 
very small. Although unguided expression simplification is a hard problem, it would 
be worth using a MACSYMA-like system to simplify the generalized rules' precondi- 
tions as much as possible, whenever the behavior costs of operators are high. 
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Additional Behavior Costs 

One negative effect of encapsulating patterns of behavior rule firings is that the 
same behavior rule firing may be encapsulated in several generalized rules. For ex- 
ample, LQ deriving conflict sets by propagating values through the polybox circtdt, 
the prediction of the value 6 at the output of Ml was used in deriving both con- 
flict sets, (Ml M2 Al) and (Ml M3 Al A2). Thus, the expression (* A C) appears 
in both Rl and R2 and is evaluated twice in diagnosing a new case, yet the same 
expression is evaluated only once when deriving the two conflict sets by propagating 
values, because the product is stored as the output of Ml. More generally, checking 
the preconditions of the generalized rules wiU repeatedly incur the behavior costs of 
any shared rule firings. 

Sufficient repetition of behavior costs in the preconditions of generalized rides 
makes it worthwhile to share those costs. Analogous to the idea of rete networks, 
subexpressions common to more than one generalized ride could be computed just 
once, the smallest ones first. That would eliminate the repetitive computation, but 
would require storing the intermediate results of subexpression evaluation. Keep in 
mind that one overhead saving from encapsulating proof trees is avoiding the storage 
of intermediate results (the other is avoiding the binding of variables for the opera- 
tors). However, if the behavior costs of the operators are sufficiently high, eliminating 
repetitive computation of those behavior costs will be worth the re-introduction of 
the overhead of storing intermediate results. 

4.1.3 Finding All Solution States 

All of the above discussion about biasing search and encapsulation becomes moot 
if the problem solver has to find all of the goal states in a finite search space, rather 
than a single goal state. In searching for all of the goal states, there is no reason to 
check generalized rules unless the program is assured that its set of generalized rules 
covers all of the possible solution paths. To see this, note that after finding some 
solutions using generalized rules, the program still exhaustively explores the search 
space to find other potential solution states. Figure 4.1 graphically illustrates this 
problem. Even after SI and S2 are found using generalized rules, nodes A, B and C 
must be constructed and visited in order to find S3 and S4 and check whether they 
are solutions. Thus, the problem solver loses both the search bias effect (it explores 
the rest of the search space) and the encapsulation effect (it constructs A and B). 

4.1.4 Summary of Problem Characteristics 

There are two sources of power in Explanation-Based Generalization of successful 
problem solving episodes. First, the generalized rules can act as remembered pat- 
terns of operator applications to bias the problem solver toward patterns that have 
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Figure 4.1: The problem solver must find all solutions. Si and S2 are found using 
generalized rules; nodes A, B, and C must be constructed and visited in order to find 
S3 and S4 and check whether they are solutions. 

been useful in solving previous problems. Second, the generalized rules encapsulate 
the patterns of operator applications, so that the overhead cost of checking opera- 
tor preconditions and constructing intermediate search nodes can be eliminated, at 
the expense of evaluating the bodies of some operators more than once. This sec- 
tion summarizes the characteristics of problems for which EBG alone will produce a 
significant improvement in problem solving speed. 

Bimodal Distribution The more potentially useful patterns of operator applica- 
tions that are never actually usefiJ in solving a problem presented to the prob- 
lem solver, the more effective EBG will be in improving performance. 

Inexpensive Rule Bodies The smaller the cost of evaluating the bodies of op- 
erators, relative to the costs of matching operator preconditions and storing 
results, the more effective EBG will be in improving performance. 

Search For One Solution The problem solver's task must be to search for a single 
goal state, not all goal states, unless it can be sure that its generalized rules 
encapsulate all of the possible search paths. 
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4.2 Generalizing Explanations of Failures 

A second popular use of explanation-based generalization in problem solving is to 
encapsulate the explanation of why a search node is inconsistent with a goal state. 
A generalized rule may be used either to speed up the process of checking whether a 
node satisfies the goal conditions (by ruling it out quickly), or to prune other nodes 
from the search space, or both. This section considers each of the two uses in turn. 

4.2.1 Finding the Failure Faster 

One way that a generalized explanation of a search node inconsistency can be 
used is to reduce the effort the problem solver expends in testing other search nodes 
for consistency. The problem solver can use a generalized rtde to identify a search 
node as inconsistent faster than it would have been able to find the inconsistency 
without the generalized rule. 

This is effectively using EBG to encapsulate succe^^yu/ patterns of operator appli- 
cations (discussed in Section 4.1), provided we reformulate proving the inconsistency 
of a node with the goal state as a search problem in a separate space. The oper- 
ators in this search space are inference rules, the initial state is a set of assertions 
about the node from the original space, and the goal is to derive a contradiction. The 
generalized rule that is created encapsidates a successful deduction of a contradiction. 

We can draw on the analysis of Section 4.1 to characterize when speed will increase 
by using EBG to generalize the deduction of the inconsistency of a search node. 
First, in order for EBG to bias search, there must be a bimodal distribution in 
the frequency of applicability of derivations of inconsistency: derivations must be 
applicable frequently or not at all. Second, to benefit from encapsulation, the bodies 
of the inference rules used to derive the inconsistency of a search node must be 
inexpensive to evaluate. To the extent that these hold, EBG can help the problem 
solver to prove failures faster. 

4.2.2 Reducing Search 

A second way to use generalized rtdes that identify inconsistent nodes is to reduce 
search in the original space. First, by assuming that inconsistent search nodes never 
lead to goal states, the problem solver can cut off the entire sub-space reachable 
from the node identified as inconsistent. That excision will not improve performance, 
however, if the original problem solver was able to cut off the same sub-space. Second, 
by using explicit simplifying assumptions, the problem solver may be able to ignore 
an even larger sub-space. 

The inconsistency of a search node can be used to cut off search only if incon- 
sistency is monotonic (i.e. no goal state can ever be reached from a state that is 
inconsistent with the goal [MB87]). Otherwise, the inconsistency of a search node 
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justifies avoiding only that node during the search, not any of its successors. Planning 
problems normally do not satisfy the monotonicity of inconsistency criterion, while 
constructive problems such a« design and local constraint satisfaction problems do 
satisfy it. In planning problems, some operators tend to recover from the effects of 
others, so even if a state is far from satisfying the goal criteria, the problem solver 
can not assume that no goal state is reachable from there. In some constructive prob- 
lems, on the other hand, such as design, additional operator applications simply add 
to the design, rather than change it, so a partial design that is inconsistent cannot 
lead to a complete design. Local constraint satisfaction problems [Mac87], where the 
problem solver's task is to assign labels to a number of objects without violating a 
set of constraints among the labels, also satisfy the criterion: if a partial labeling is 
inconsistent, every complete labeling resulting from it will also be inconsistent. 

Cutting off a Search Sub-tree 

Assuming the monotonicity of inconsistency, a generalized rule that identifies a 
search node as inconsistent can be used to excise the entire sub-tree reachable from 
that node. This can offer a significant savings if the original problem solver would 
have explored that sub-tree. 

However, in order to make a fair comparison, the original problem solver should 
also be allowed to make use of the monotonicity of inconsistency to cut off search 
at nodes that it proves inconsistent. As we explore below, in solving constraint 
satisfaction problems and performing circuit diagnosis, a good problem solver can and 
should check the consistency of each node as it is visited. In that case a generalized 
rule can not significantly reduce the search. It may, however, stiU be useful in proving 
quickly that individual nodes are inconsistent, as in Section 4.2.1. 

The original problem solver may not be able to exploit the monotonicity of in- 
consistency if checking for inconsistencies in partial solutions is much more expensive 
than checking for inconsistencies in complete solutions. Consider analog circuit de- 
sign to meet certain global speed and power usage requirements [Wil88]. It is very 
expensive to check a partial design for consistency with the requirements, but com- 
pleted designs can be simulated. Hence, using EBG to explain failures of circuit 
designs may be very useful in speeding up later design performance. 

In solving local constraint satisfaction problems, however, the problem solver 
should check the consistency of partial labelings rather than waiting until it has 
constructed complete labelings. Consider, for example, the Failsafe program [MB87], 
which learns from explanations of its failures in solving simplified floor planning 
problems. Its task is to place a given set of rectangles (rooms) of given sizes onto a 
larger rectangle (the floorspace), such that the placement satisfies certain constraints, 
such as rooms not overlapping. This can be viewed as a constraint satisfaction 
problem, where each room must be labeled with its position on the floor. The program 
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uses a generate and test exploration of the problem space, placing all of the rooms and 
then checking if the placement satisfies all of the constraints. Failsafe's performance 
improves because the generalized rules it constructs from the explanations of the 
inconsistency of room placements prune a large portion of the search space that 
the generate and test problem solver would have explored. If, however, the original 
problem solver had checked the consistency of partial labelings, the generalized rules 
would prune only portions of the search space which the original problem solver 
would have avoided. Hence, using EBG to identify inconsistent partial solutions 
to constraint satisfaction problems only eliminates search subtrees that the original 
problem solver should not have explored in any case. 

A second example of the original problem solver being able to cut off search 
at inconsistent nodes is provided by our use of EBG to generalize the derivation 
of contradictory values during model-based diagnosis. We formulate the diagnostic 
engine's task as search through the space of subsets of components (contexts), with 
the goal of finding the largest subsets that are consistent with the observations. The 
search operators are not the component behavior rules, but rather are operators 
that add one additional component to a context. The component behavior rules 
are used to derive contradictory values in a context, thus identifying the context as 
inconsistent. Generalized rules also identify inconsistent contexts. The monotonicity 
of inconsistency criterion is satisfied: if Ml, M2, and Al can predict contradictory 
values at F, any larger set of components can predict the same values at F. However, 
a diagnostic engine should check for the inconsistency of contexts as it visits them, 
and the one we used does so. If that is the case, the diagnostic engine cuts off search 
below inconsistent nodes, with or without the generalized rules. Thus, the diagnostic 
engine of Chapter 2 is another problem solver for which the generalized rules don't do 
any better than the original problem solver at cutting off search below inconsistent 
nodes. As will be described shortly, however, the generalized rules allow it to cut off 
search above inconsistent nodes, which the original problem solver could not do. 

Cutting off Search Above the Failure Node 

By using simplifying assumptions, it may be possible to combine the results of 
two or more generalized rules to cut off search above the nodes that generalized rules 
identify as inconsistent. Section 3.5 discussed using the single-fault assumption to 
intersect the conflict sets identified using generalized rvdes. The single-fault assump- 
tion in diagnosis is one example of a simplifying assumption that allows the problem 
solver to combine the results of several generalized rules. 

The single- fault assumption allows the diagnostic program to cut off search even 
before it reaches the contexts the generalized rules identify as inconsistent. For ex- 
ample, in Figure 4.2, if two generalized rules identify the conflict sets (Bl B2 33 B4) 
and (Bl B2 B57) for a hypothetical circuit with 57 components, the augmented di- 
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Figure 4.2: With or without EBG, the problem solver can cut off the subspaces 
rooted at the inconsistent contexts (Bl B2 B3 B4) and (Bl B2 A57). Using EBG 
and the single-fault assumption to intersect inconsistent contexts, the problem solver 
can avoid the entire subspace rooted at (Bl B2). 

agnostic engine intersects the two conflict sets, to reduce the suspect set to (Bl B2) . 
When propagating values, it never explores any contexts containing both Bl and B2 
(i.e. it never propagates values using both Bl and B2.) 

Using the single-fault assumption to intersect inconsistent contexts is a special 
case of the use of expUcit simplifying assumptions to combine the results of general- 
izations. Explicit ONE-OFs and the following hyperresolution inference rule, taken 
from [dKW86], provide a more general framework for combining the results of several 
generalized rules that identify inconsistent search nodes. 

ONEOF(Ai,^,...) 

CONFLICT-SETa.- where Aj e aj and Aj^j ^ aj for aU i 

CONFLICT-SET U[ai - {Ai}] 

The single- fault assumption is equivalent to stating that, given any pair of com- 
ponents, one of them must be working. That is, there is a ONEOF for every pair 
of components. Instantiating the hyperresolution rule above with the assumption 
(ONEOF B4 B57) and the conflict sets (Bl B2 B3 B4) and (Bl B2 B57), identifies 
the conflict set (Bl B2 B3). This, together with conflict set (Bl B2 B57) and the 
assumption (ONEOF B3 B57), can be used to identify the conflict set (Bl B2). The 
effect is the same as intersecting the two original conflict sets, but the process is less 
eflicient, because the program may try many choices of ONEOFs before finding the 
right ones to use with the hyperresolution rule. 

While hyperresolution is not as efficient as set intersection for exploiting the single 
fault assumption, hyperresolution is a more general framework. Hyperresolution can 
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exploit weaker simplifying assumptions than the single fault assumption. The sim- 
plifying assumptions, together with hyperresolution, could be used to cut off search 
above the search nodes that the generalized rules identify as inconsistent. Such a use 
of generalized rules would reduce the search space that the problem solver explores, 
potentially improving its performance. 

4.2.3 Summary of Problem Characteristics 

There are two potential sources of power in using EBG to encapsulate explanations 
of the inconsistency of search nodes. First, finding the inconsistency of a search 
node may be very expensive, in which case encapsulating a successful derivation of 
an inconsistency may reduce that cost. In that case, generalizing explanations of 
failures is the same as generalizing successful patterns of inferences in the space of 
derivations of inconsistencies. Second, knowing the inconsistency of one search node 
may enable the problem solver to ignore a large portion of the search space. Several 
characteristics of problems make them appropriate for generalizing the explanations 
of failures to reduce the search space: 

Monotonicity of Inconsistency If the the inconsistency of a search node with the 
goal conditions implies that no goal state can be reached from it, the problem 
solver can cut out the search sub-spaces rooted at that node. Consistent labeling 
problems and constructive problems such as design may have this characteristic, 
but planning problems generally do not. 

Relative Cost of Checking Inconsistency Checking for the inconsistency of a 
partial solution should be much more expensive than checking the consistency 
of a complete solution. Otherwise, the original problem solver will check the 
consistency of each node as it constructs it and the generalized rules will prune 
oiJy parts of the search space that the problem solver would not have explored 
in any case. 

Simplifying Assumptions If reasonable simplifying assumptions (such as the single- 
fault assumption for circuit diagnosis) can be used to combine the knowledge 
that several search nodes are inconsistent, EBG may speed up problem solving 
by cutting off search at nodes above the nodes identified as inconsistent by the 
generalized rules. 

4.3 A Note on Parallelism 

Some researchers have suggested that parallel processing will reduce the marginal 
cost of checking additional generalized rules to zero. If the cost of checking a gener- 
alized rule were zero, then the analysis in this chapter would be moot: it does not 
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matter how much benefit a generalized rule provides because there is no cost to using 
it. The argument, however, is not correct, because it considers only clock time, and 
not processor time, as a resource. 

While the analysis in this chapter is phrased in terms of time costs, it could 
easily be generalized to refer to arbitrary resource costs. Current research in parallel 
processing considers processor time an important resource. As evidence, consider 
that some complexity analysis of parallel algorithms is parameterized by both n, the 
size of the problem and by p, the number of processors to be used [SchSO]. Other 
complexity analysis of parallel algorithms explicitly gives both the step-complexity 
(clock-time) and the element- complexity (the total amount of processor time used) 
[Ble88]. 

If processor time is considered a resource, then the cost of checking all of the gen- 
eralized rules will always grow with the number of generalized rules, and the analysis 
in this chapter is relevant. Another way to look at this is to note that it is mislead- 
ing to use a parallel processor to check the generalized rules without considering if 
the original problem solver could have made better use of those processors. Even 
with a parallel processor, we must be careful about whether the generalized rules are 
improving or reducing performing. 

4.4 Related Work 

4.4.1 The Causes of Expensive Generalized Rules 

Tambe and Newell [TN88] have analyzed characteristics of problem spaces that 
lead to the creation of individual generalized rules (chunks) that are expensive to 
check. Their research is useful since one factor that affects the utility of using gen- 
eralized rules is the cost of checking them. Expensive generalized rules, however, 
may be the ones that are most frequently applicable, or the ones that provide the 
most benefit when they are applicable. In this document we have tried to identify 
characteristics of the problem space which will lead to the creation of rules that have 
positive utility overall. 

4.4.2 Operationality Criterion 

Most previous work on evaluating the utility of EBG has focused on finding an 
"operationality criterion" for selecting only those generalized rules that will speed up 
performance. The analysis in this chapter is novel in that it recognizes that no such 
criterion is possible, and instead uses the factors affecting operationality (recurrence, 
manifestness, and exploitability) to characterize the problems for which EBG will 
improve performance. 

The search for a good operationality criterion led to identifying first manifestness, 
then recurrence and exploitabiUty, as the key factors affecting the utility of a general- 
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ized rule. Originally, the only issue considered was how manifest the generalizations 
would be. Early work sought simple restrictions on which predicates could appear 
in the preconditions of generalized rules [MKKC86]. Unfortunately, restrictions on 
predicates turned out to be insufficient to capture the requirement that generalized 
rules be checked efficiently. For example, the predicate PROVABLE can be evaluated 
efficiently on the theorem "2 + 2 = 4" but not so easily on Fermat's last theorem 
[DM86]. Later work has fallen back on measuring the cost of checking generahzed 
rtiles after they are constructed, as we did in Chapter 3. 

More recently, work on PRODIGY identified the notions of recurrence and ex- 
ploitability [MCE+87] and research on MetaLex [Kel87a, Kel87b] focused on ex- 
ploitability. The recurrence of a rule depends not only on how many cases it applies 
to (the generality criterion of [Seg87]), but also the frequency with which those cases 
occur in the distribution of cases presented to the problem solver. Keller pointed out 
[Kel87b] that the degree to which a problem solver can exploit a generalized rule may 
change over time as the problem solver acquires more rules. 

The importance of recurrence and exploitability make it clear that no "opera- 
tionality criterion" can be constructed that can guarantee that all of the generalized 
rules that are remembered wiU have positive utility in speeding up problem solving. 
Generalized rules are created from single cases, while recurrence can be measured 
only by examining the entire distribution of cases. Hence, there is no "operationality 
criterion" that can filter generalized rules as they are created. 

4.4.3 Forgetting Rules With Low Utility 

The PRODIGY system attempts to evaluate the utility of each generalized rule 
it learns and then "forget" rules that have negative utility. PRODIGY takes into 
account not only how recurrent a riile is, as suggested in Section 4.1.1, but also how 
manifest and how exploitable it is. The utility of a rule is expressed as the time saved 
by using it minus the cost of checking it. The expected time saved is the frequency 
with which it applies times the expected savings from using the rule in those cases 
to which it is appUcable. 

Two difficulties arise in calculating the expected utility of a generalized rule. One 
is the expense of gathering statistics about how frequently rules are applicable. The 
PRODIGY system addresses this issue by learning and evaluating utility only while 
solving a set of training examples. Hence, the problem solver avoids the expense of 
keeping statistics during normal performance. A second difficulty arises in measuring 
the benefit of an individual rule, when it is applicable, because the rules interact 
with each other. For example, the benefits from the applicability of a generalized 
rule that identifies a conflict set in diagnosis depend on how many suspects have 
already been exonerated by other generalized rules. PRODIGY finesses this second 
difficulty by estimating the benefits of a rule's appHcability. Minton's results [Min88] 
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demonstrate that PRODIGY estimates the utility of generalized rules well enough 
to improve performance by throwing out some rules. His results also suggest that 
better utility estimates are possible, and would improve performance even more. 

Of course, filtering the generalized rules assumes that some rules will have positive 
utility and others will have negative utility. The analysis in this chapter points out 
that there are characteristics of the problem solver and the distribution of problems 
that will affect the utility of all of the generalized rules. 

4.5 Conclusion 

This chapter analyzed the sources of power in two common uses of EBG: gen- 
eralizing successful problem solving episodes, and generalizing the explanations that 
search nodes are inconsistent. 

There are two sources of power in generalizing successful problem solving episodes. 
First, the problem solver's search is biased toward search paths that have led to goal 
states in solving previous problems. In order to bias search effectively, however, EBG 
may need to be supplemented by statistical information about the frequency with 
which patterns of inferences are applicable. Second, the generalized rules encapsulate 
a pattern of operator applications, so that using a generalized rule saves the overhead 
cost of triggering operators and storing intermediate results that would be incurred 
in performing the pattern of operator applications. If, however, it is expensive to 
evaluate the bodies of search operators, the overhead savings from encapsulation 
may be outweighed by the cost of checking the generalized rules. A final caveat is 
that generalization of successful problem solving episodes may accelerate a search for 
a single solution state, but not a search for aU of the solution states. 

A problem solver can make two uses of generalized rules that identify search 
nodes as inconsistent with the goal state. First, the generalized rules may reduce the 
cost of proving that a search node is inconsistent; the generalized rale, then, is an 
encapsulated successful search path in a different search space, the space of proofs of 
inconsistency of a search node. Second, the problem solver may use the generalized 
rules to cut off search in the original search space. The problem solver can cut off 
search below a node that a generalized rule identifies as inconsistent, if a search node 
being inconsistent implies that no goal state can be reached from it. Gains from 
this use of EBG are often misleading, however, since a good problem solver will 
normally cut off the same part of the search space even without the generalized rule. 
The problem solver may also be able to use several generalized rules, together with 
simplifying assumptions (such as the single-fault assumption in circuit diagnosis), to 
cut off search above the inconsistent nodes, which the original problem solver can not 
do. 



Chapter 5 

Similar = Same Fault Hypothesis 



Chapters 2 and 3 analyzed one dimension of similarity in detail. The next two chap- 
ters explore alternative definitions of similarity and the learning methods that arise 
from them. In this chapter, we extend the diagnostic task to include the identification 
of specific nodsbehaviors for components. Then, we define two sets of observations as 
similar if the same misbehavior of the same component can explain the symptoms of 
both. For exiimple, the observations in both of the cases presented in the introduc- 
tion (repeated as Figure 5.1) can be explained by the first bit of Mi's output being 
stuck low. In the first case, Ml outputs 4 as the product (instead of 6), which Al 
adds to the 6 predicted at Y to produce 10 at F, which agrees with the observation. 
In the second case, Ml outputs (instead of 2), which Al adds to the 5 predicted at 
Y to produce 5 at F, which agrees with the observation. 

One way to make manifest such a similarity between cases is to generalize the 
reasoning process used to check the consistency of a fault hypothesis in one case, 
then check if the generalized rule is applicable to the other case. This is another 
use of the EBG technology for generalizing patterns of inferences. We also propose 
a techiuque called lifting for creating generalized rules that check the consistency of 
fault hypotheses regardless of the reasoning process used by the diagnostic program. 
A very interesting, but as yet unimplemented, idea is to use a design verification 
to guide simplification of the preconditions in the lifted rules, to make them more 
efficient. 



5.1 Fault Hypotheses 

Thus far in this thesis, the troubleshooter has not considered component failure 
modes. It has been concerned only with identifying the components for which a 
misbehavior of some kind could account for all of the symptoms. Some kinds of 
component misbehavior are more plausible than others. For example, if adder Al is 
implemented as a single TTL chip, it would fail in predictable ways (e.g. pins coming 
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Figure 5.1: The Polybox Circuit. Both of the sets of observations shown are consistent 
with the first bit of Al's upper input being stuck low. 

loose) that yield predictable misbehaviors (stuck-at faults). 

We now extend the diagnostic engine's task to finding specific fault hypotheses 
for the candidate components that it finds. The diagnostic engine is provided with 
a list of the common modes of failure of each of the components in the device it 
diagnoses. After it finds the candidate set, it proposes as a fault hypothesis each of 
the modes of failure that it knows about for each candidate component and checks 
whether each hypothesis can explain all of the device behavior. The program outputs 
the fault hypotheses that are confirmed. 



5.2 Generalizing Fault Envisionments 

Given the observations of Figure 5.1a), Ml and Al are the single-fault candidates. 
The program then looks up the known potential misbehaviors for Ml and Al and 
checks each for consistency with the observations. One possibility is that the first 
bit of Mi's output is stuck-at 0. The program performs a simulation, referred to as 
a fault envisionment, to determine whether this is a consistent fault hypothesis. It 
propagates the inputs through the components, computing the hypothesized misbe- 
havior for Ml and correct behaviors for the other components (see Figure 5.2.) This 
yields values at the circuit outputs. In this case, the predicted outputs are consistent 
with the observed outputs, so the fault hypothesis is consistent. 

Another possibility is that the zeroth bit of Mi's output is stuck-at 1. In that 
case, a fault envisionment would predict the value 7 at X, whence 13 at F, which 
contradicts the observed value of 10. Hence, that fault hypothesis is eliminated. 
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Figure 5.2: Envisioning the hypothesis that the first bit of Mi's output is stuck-at 0. 



By encapsulating particular fault envisionments, EBG can find sufficient condi- 
tions for inferring that a fault hypothesis is consistent or sufficient conditions for 
inferring that it is inconsistent with the observations. The expressions in brackets 
in Figure 5.2 illustrate generalizing the envisionment of the first bit of Mi's output 
being stuck-at 0. The resulting generalized rule is:* 

R6: IF (AND (= ?F (+ (decimal-stuck-at 1 (* ?A ?C)) 

(* ?B ?D))) 
(= ?G (+ (* ?B ?D) (* ?C ?E)))) 
THEI (fault-hypothesis '(decimal-stuck-at 1 (OUTPUT Ml))) 

The envisionment that eliminated the hypothesis that the zeroth bit of Mi's 
output is stuck-at 1 can also be generalized. It yields the rule: 
R7: IF (HOT (= ?F (+ (decimal-stuck-at 1 (* ?A ?C)) 

(* ?B ?D)))) 
THEH (not-fault-hypothesis '(decimal-stuck-at 1 (OUTPUT Hi))) 

Note that encapsulations of envisionments give only sufficient conditions for the 
confirmation or elimination of a fault hypothesis. The reason is that any particular 
fault envisionment may use only part of the behavior of some components. Hence, 
even if the inferences in a particular envisionment confirming a fault hypothesis do not 
apply to a new case, the program cannot eliminate the fault hypothesis. As it turns 

Mecimal-stuck-at is a function that takes three arguments, a bit to stick, the value it's stuck- 
at (0 or 1), and a decimal number. The bit can be any number from up to the number of bits in 
the decimal number. 
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out, R6 does give necessary and sufficient conditions for checking the hypothesis that 
the first bit of Mi's output is stuck-at 0, but that can not be counted on in general. 

5.2.1 Utility of EBG on Envisionments 

The program can use the generalized rules to jump to the conclusion that a 
fault hypothesis is consistent (or inconsistent, depending on the rule) without having 
to perform the fault envisionment. For either kind of generalized rule, this is an 
example of using EBG to encapsulate successful patterns of behavior rule firings, 
as in Section 4.1. We draw on the analysis from that section to characterize when 
performance will improve. 

Search Reduction 

It is not always necessary to propagate values through every device component 
in order to prove the consistency (inconsistency) of a particular fault hypothesis. 
Generalized rules that identify fault hypotheses as consistent (inconsistent) can bias 
the search involved toward patterns of value propagations that were useful in solving 
past cases. As discussed in Section 4.1.1, this search bias will be effective to the 
extent that many value propagations are never used in proving the consistency (in- 
consistency) of a particular fault hypothesis. Depending on the distribution of cases 
presented to the problem solver, the search bias may be more or less effective. 

In addition, the topology of the device may be such that some components will 
never contribute to proving a particular fault hypothesis inconsistent, regardless of 
the actual distribution of cases presented to the problem solver. For example, a 
component X may contribute to only one or a few circuit outputs. Hence, only 
components that can contribute to the same outputs X contributes to will ever be 
useful in proving that a fault hypothesis about X is inconsistent. Generalized rules 
that prove the inconsistency of a fault hypothesis about component X will all ignore 
the "irrelevant" components, regardless of the distribution of cases. In performing 
a fault envisionment for X, however, the original problem solver would not know to 
ignore those "irrelevant" components, so the generalized rules reduce the amount of 
search. 

Effect of Encapsulation 

The second source of power from generalizing successful patterns of inferences 
lies in checking the weakest preconditions and jumping to conclusions rather than 
coniputing all of the intermediate steps. Since checking a generalized rule requires 
paying all of the behavior costs of the encapsulated rule firings, and some rule firings 
will be encapsulated in several generalized rules, the benefits depend on the relative 
costs of evaluating the bodies of behavior rules versus the overhead costs of triggering 
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behavior rules (binding variables in their left-hand sides) and storing their results. 
The cost of evaluating the body of a behavior rule depends on the complexity of 
the component. The overhead costs should be low when doing fault envisionment, 
as is argued below. The cost of triggering behavior rules with a constraint network 
is low. It is not necessary to keep track of dependencies, because the program is 
only interested in finding out whether a fault hypothesis is consistent, so recording 
a new value should be inexpensive. Remember that recording dependencies was 
a major cost of asserting a new value in Section 2.1, where the program needed 
the dependency structure in order to find the components supporting derivations of 
contradictory values. Hence, the effect of encapsulation in speeding up performance 
of fault envisionment should be minor if the component behaviors are inexpensive, 
and may be negative if the component behaviors are expensive. 



Summary 

The program may be able to use generalized fault envisionments to reduce the 
cost of checking the consistency of a fault hypothesis. It can use generalized rules to 
bias the search for a proof of the consistency (or inconsistency) of a fault hypothesis 
if some components never contribute to proving the consistency (or inconsistency) of 
a fault hypothesis. The overhead saving from checking generalized rules rather than 
propagating values will not be significant, because the program need not keep track of 
dependencies during fault envisionments. The overhead saving may be overshadowed 
by the cost of computing some components' behavior repeatedly in several generalized 
rules, especially if the components have complex behavior. 



5.3 Lifting Fault Hypotheses 

It is possible to construct both necessary and sufficient conditions for recognizing 
whether or not a fault hypothesis is consistent with observations. We call the process 
lifting. A fault hypothesis for a component is lifted to a fault hypothesis for the 
device using a symbolic fault envisionment. As illustrated in Figure 5.3, a symbolic 
fault envisionment is an envisionment in which variables are used for the inputs. 
The symbolic simulation can be packaged into a generalized rule for checking the 
consistency of a given fault hypothesis in future cases. The generalized rule simply 
composes the behaviors (and misbehaviors) of the components: 
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Figure 5.3: A carry-chain adder composed of four full adders. Lifting the fault 
hypothesis that FA^'s carry-bit is stuck-at 1. S refers to the sum-bit (the low-order 
bit of the sum) of three single-bit arguments and C refers to the carry-bit (the high- 
order bit). 



R8: 

IF (AHD (= ?S1 (S ?A1 ?B1 ?C0)) 

(= ?S2 (S ?A2 ?B2 (STUCK-AT 1 (C ?A1 ?B1 ?C0))) 
(= ?S3 (S ?A3 ?B3 

(C ?A2 ?B2 (STUCK-AT i (C 7A1 ?B1 ?C0))))) 
(= ?S4 (S ?A4 ?B4 

(C ?A3 ?B3 

(C ?A2 ?B2 

(STUCK-AT 1 (C ?A1 ?B1 ?C0)))))) 
(= ?C4 (C ?A4 ?B4 

(C ?A3 ?B3 

(C ?A2 ?B2 

(STUCK-AT 1 (C ?A1 ?B1 ?C0))))))) 
THEH (FAULT-HYPOTHESIS '(STUCK-AT 1 (C-OUT FAl))) 
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While this can be viewed as an application of EBG, here the reasoning process that 
is generalized is not one that is employed by the original problem solver. Our previous 
use of EBG has been to encapsulate oiJy that part of the components' behavior that 
was actually used to deduce some conclusion, and not the most general behavior of 
all of the components in the circuit. The symbolic fault simulation contains all of 
the behavior of the circuit components. As a result, encapsulating it yields necessary 
and sufficient conditions for the observations to be consistent with a particular fault 
hypothesis. 

There are difficult issues involved in symbolic simulation that are beyond the scope 
of this thesis. First, the symbolic simulation may involve iterative behavior, so that 
the straightforward simulation may not end. Weld and others [Wel86, SD87, CMB88] 
addressed the issue of noticing and generalizing iterative behavior in simulations. 
Generalizations of their techniques may apply to symbolic simulations. Second, com- 
ponents that compute conditional outputs (e.g. multiplexers) may make the com- 
posed behavior expressions prohibitively large. 

5.3.1 Utility of Lifting 

The analysis of the utility of lifted rules is somewhat different than previous utility 
analyses because a lifted rule provides necessary as well as sufficient conditions for 
deciding the consistency of a fault hypothesis. There can be only one lifted rule for a 
given fault hypothesis and the only question is whether checking the lifted rule takes 
more or less time than performing a fault envisionment. 

A lifted rule wiU increase the number of component behaviors that are simulated, 
but avoids the overhead costs of running behavior rules. Unlike the generalizations 
of fault envisionments, a lifted rule encapsulates all of the behavior of the circuit 
components. Any particular fault envisionment may use the behavior of only some of 
the components. Hence, at least as much, and possibly more, component behavior is 
simulated in checking a lifted rule as would be simulated during a fault envisionment. 
On the other hand, checking a lifted rule avoids the matching of component rule 
preconditions and the storage of intermediate results. To repeat a frequent theme 
in this thesis: the overall utility of a lifted rule depends on the relative costs of the 
component behaviors versus the overhead costs of the rule system. 

The utility of a lifted rule may be increased by simplifying the expressions in its 
preconditions. In general, expression simplification is an intractable problem. Some 
guidance may be available, however, from a design verification, a proof that the 
behavior specification for a device is met by its implementation. Design verifications 
might plausibly be generated during the design process, either by human designers 
or by a computer program. Simplifying the expressions in the preconditions of the 
lifted rules will reduce the cost of checking them. 

Design verifications may make the problem tractable by providing some guidance 
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to the process of expression simplification. The chief heuristic is: 

• If the behavior of only one component changes, a proof that the circuit imple- 
ments its designed behavior may go through almost unchanged. 

A hand demonstration of this idea is given below. Automating this process is a 
very interesting problem for future research, since real cases are likely to be much 
more difficult than the example below. 

Consider the carry chain adder in Figure 5.3. FAi through FA^ compute the sum 
of three one-bit numbers. The component behavior rules compute (Sabc), the sum-bit 
of inputs a, b, and c, and (Cabc), the carry-bit. The circuit adds two decimal numbers 
between and 15, plus a single-bit carry-in. The proof that the circuit meets the 
design specification relies on two abstractions: the conversion from binary to decimal 
representation of integers and the positional representation of binary numbers on the 
wires of the circuit. These abstractions are reflected in the following equations, which 
are used repeatedly in the design verification. 



A = ai + 2a2 + 4a3 + 804 
B = 61 -I- 262 + 463 -F 864 

Output = 3i + 2S2 + 4^3 -I- 8S4 -I- I6C4 

a + b+c = {S abc) + 2{C abc) 

A proof that the adder is implemented by its substructure is shown in Figure 5.4. 
We assume that such a proof would be provided as an input to a diagnostic program. 
The first two lines of the derivation give the output as a composition of the component 
behaviors. This composition of behaviors is then simplified, using the behavioral 
abstraction equations above. 

Now consider the effect on this proof of assuming that FAt is computing an 
incorrect carry-bit. This might be the case if the carry- bit were stuck-at 1, More 
generally, suppose that the carry bit computed is some function g of the inputs. Thus, 
FAi{aibiCo) = (S aibiCi) + 2{g UibiCi). Intuitively, the adder circuit will now add 
its inputs, but with an error term corresponding to twice the error on the carry-bit 
of FAi. In Figure 5.5 we see that, by expressing the faulty behavior as the correct 
behavior plus an error term, essentially the same proof goes through, except that an 
error term is left over. 

Note that this technique will work even with multiple faults, although there will 
be correspondingly more error terms. The result of this replayed proof would be the 
following, simpler version of R8: 
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Output = Si + 2S2 + 4*3 + 854 + I6C4 

= {S aihco) + 2{S aihiC aihco)) 
+4(5 a^hiC a2b2{C oifeiCo))) 
+8(5 a4fe4(C 0363(07 a2b2{C aihco)))) 
+16(07 a464(C a363(C 0262(07 aihco)))) 

= (5 aibico) + 2(5 0262(07 O161C0)) 
+4(5 0363(07 0262(07 0161C0))) 
+8(04 + 64 + (07 0363(07 0262(07 ai6iCo)))) 

= (5 ai6iCo) + 2(5 0262(07 oi6ico)) 
+4(03 + 63 + (07 0262(07 ai6iCo))) 
+8(04 + 64) 

= (5 aibiCo) + 2(02 + 62 + (07 aibiCo)) 
+4(03 + 63) + 8(04 + 64) 

= Co + (oi + 61) + 2(02 + 62) + 4(03 + 63) + 8(04 + 64) 

= co + A + B 

Figure 5.4: A design verification for the carry- chain adder. 



R8': 

IF (AHD (= ?Output 

(+ ?C0 ?A ?B 

(* 2 (- (g ?A1 ?B1 ?C0) (C ?A1 ?B1 ?C0)))))) 
THEH (FAULT-HYPOTHESIS ' (STUCK- AT 1 (CARRY-OUT FAl))) 

Even though R8' encapsulates the behavior of every component, evaluating the 
simplified expressions may be more efficient than simulating the components individ- 
ually in a fault simulation. 



5.3.2 Summary 

Lifted rules may be more efficient than faidt simulation for checking the con- 
sistency of fault hypotheses, if expressions in the lifted rules' preconditions can be 
simplified. An interesting direction for future research would be to automate the use 
of design verifications to guide the simplification of lifted rules' preconditions. 
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Output = si + 2s2 + 433 + 8S4 + I6C4 

= (5 ai6iCo) + 2(5 0262(3' oi&ico)) 

+4(5 0363(^7 a262(ff aihco))) 

+8(5 0464(07 0363(C a262(flr ai6iCo)))) 

+16(C a464(C 0363(07 Oibiig atbico)))) 
- (5 0161C0) + 2(5 0262(3 0161C0)) 

+4(5 0363(07 0262(3 0161C0))) 

+8(04 + 64 + (07 0363(07 a262(<7 aihco)))) 
= (5 0161C0) + 2(5 0262(3 0161C0)) 

+4(03 + 63 + (07 0262(3 0161C0))) 

+8(04 + 64) 
= (5 0161C0) + 2(02 + 62 + (3 0161C0)) 

+4(03 + 63) + 8(04 + 64) 
= (5 aibico) 

+2(02 + 62 + (07 O161C0) + (3 0161C0) - (07 0161C0)) 

+4(03 + 63) + 8(04 + 64) 
= Co + (oi + 61) + 2(02 + 62) + 2((3 0161C0) - (07 O161C0))) 

+4(03 + 63) + 8(04 + 64) 
= co + ^ + 5 + 2((3ai6iCo)-(07oi6iCo)) 

Figure 5.5: Using the design verification to guide simplification of the preconditions 
of R8. 

5.4 Exploitability 

Constraint suspension sometimes provides enough information about how a com- 
ponent must be misbehaving that the program can identify the consistent fault hy- 
potheses for the component without resorting to fault envisionment. For example, 
in performing constraint suspension on Ml while diagnosing the polybox circuit in 
Figure 5.1a), the program would predict the value 4 at Mi's output from 10 at F 
and 6 at Y. Given values for Mi's inputs and outputs, the program can check the 
consistency of the fault hypothesis that Mi's zeroth output bit is stuck high (it is 
consistent) without resorting to fault envisionment for the whole circuit. In such sit- 
uations, the program should not check any generalized rules constructed from fault 
envisionments or lifted rules. 

There are also situations, however, in which it is necessary to use fault envision- 
ment or generalized rules to check fault hypotheses for a component given only the 
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Figure 5.6: Envisionments are needed to check the consistency of fault hypotheses 
for M2. 

device's inputs and outputs. Figure 5.6 demonstrates a simple example of a circuit in 
which constraint suspension is unable to predict inputs and outputs for some compo- 
nents (the presence of reconvergent fanout often leads to failure of local propagation). 
With M2 turned off, no value is predicted at X, even though M2 must produce the 
value 4 in order to account for the device behavior.^ 

During candidate generation, constraint suspension is performed on every suspect 
that is not exonerated. Thus, it is easy for the program to identify when the gen- 
eralized (or lifted) rules should be checked. The generalized rules are checked only 
when constraint suspension fails to identify how the candidate must be misbehaving 
in order to account for the device observations. 



5.5 Related Work 



Pazzani's ACES program [Paz86] for diagnosing failures in the attitude control 
system of a satellite used EBG to generalize what we call fault envisionments that 
proved the inconsistency of fault hypotheses. Pazzani presented empirical evidence 
demonstrating that EBG improved performance on a few examples, but did not 
present an analysis of the source of that speedup. As discussed in Section 5.2, the 



^Note that if H were 21, M2 would not be exonerated during candidate generation, even though 
it would have to output 4.5 to account for the device behavior. 
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key factor, if performance is to improve on a large set of examples, is that the be- 
havior of some components never contribute to proving the inconsistency of certain 
fault hypotheses. Hence, even though there may be more than one generalized rule 
for identifying a fault hypothesis as inconsistent, the behavior of the irrelevant com- 
ponents will not be encapsulated in any of those generalized rules. 

The idea of composing component behaviors and then simplifying expressions is 
not new. Weise's Silica Pithecus [Wei86] and Barrow's VERIFY [Bar84] both do so 
in order to verify device designs. Hall, Lathrop, and Kirk [HLK87] use the idea to 
turn a structural and behavioral description for a device into a faster simulator. In 
all three cases, brute force or ad hoc heuristic methods were used to guide expression 
simplification. What is novel in our work is the idea of replaying a design verification 
to guide the simplification of expressions. 

5.6 Conclusion 

This chapter defined sets of observations for a circuit to be similar if they were 
consistent with the same fault hypothesis. We presented two methods for constructing 
generalized rules that could check for such similarities. The first was to generalize the 
reasoning process used in fault envisionments. The second was to lift a description of 
a component misbehavior into a description of a device misbehavior, using symbolic 
simulation. We suggested, but did not implement, the use of design verifications to 
guide the simplification of the lifted behavior expressions. 



Chapter 6 

Extensions: Relaxing Similarity 
Definitions 



This chapter suggests directions for future research on ways to relax the two notions 
of similarity described in previous chapters. First, in Chapter 2, we defined two sets of 
observations for the same device to be similar if the same derivation of contradictory 
values was applicable to both. That is, two sets of observations were similar if the 
observations could be propagated through exactly the same components, leading to a 
contradiction at exactly the same location. In this chapter we go further, and define 
notions of similarity for patterns of inferences, which in turn allows us to define sets 
of observations as similar if merely similar derivations of contradictory values are 
applicable. Second, in Chapter 5, we defined two cases to be similar if they were 
consistent with exactly the same fault hypothesis (i.e., misbehavior for a particular 
component.) In this chapter, we define notions of similarity for fault hypotheses, 
which allows us to define two cases as similar if they are consistent with merely 
similar fault hypotheses. 

Figure 6.1 summarizes the recursive nature of the similarity definitions in this 
thesis. Defining similarity for sets of observations is reduced to defining similarity 
for fault hypotheses (or patterns of inferences), and so on. As we will see in this 
chapter, the recursive definitions must bottom out either in a strict equality test, or 
in a primitive definition of similarity that is provided to the program. 

6.1 Similarities Between Patterns of Inferences 

As described above, we can define similarity of observations in terms of similarity 
of patterns of inferences (derivations): two sets of observations are similar if similar 
patterns of inferences are applicable to them. We now have to define similarity for 
patterns of inferences, which we do in two ways. First, we use information about 
equivalent roles that different components play to associate sequences of value prop- 
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Same Conclusion 
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Role Equivalent 
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Figure 6.1: The definitions of similarity proposed in this thesis. Moving to the right 
and down indicates definitions that classify more cases as similar. 
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agations that occur in difFerent places in the circuit. Second, we define two patterns 
to be similar if they lead to the same conclusion. 

6.1.1 Role Equivalent Conflict Sets 

Often, many components in a device perform the same role. For example, Ml in 
polybox is performing the same role as M3, similarly for Al and A2. In the carry- 
lookahead adder each of the XOR gates is computing a sum bit. We assume the 
program is given information about role equivalences of components as part of the 
device description. An even more ambitious project would be to have the program 
deduce the role equivalences from the structure and behavior description of the device. 

We use role equivalences to define two derivations as similar if they use "equiva- 
lent" components. This leads to the definition of two sets of observations as similar 
if derivations of contradictory values using "equivalent" components can be instanti- 
ated in both. Using this notion, the learning program could construct and generalize 
derivations of additional conflict sets that are analogous to those derived in diagnosing 
a specific case. 

An easy example of this idea occurs in diagnosis of the polybox circuit. Figure 2.4 
showed the derivation of a contradiction at output F, using components Ml, M2 and 
Al, from which EBG generated the rule: 

Rl: IF (NOT (= ?F (+ (* ?A ?C) (* ?B ?D)))) 
THEN (CONFLICT-SET '(Ml M2 Al)) 

Suppose the program is given the role equivalence that Ml plays the same role as 
M3 and Al plays the same role as A2. Simply by substituting "equivalent" compo- 
nents for equivalents in the derivation of the contradiction at output F, then gener- 
alizing the new derivation, the program could generate the rule: 

RIO: IF (NOT (= ?G (+ (* ?C ?E) (* ?B ?D)))) 
THEN (CONFLICT- SET ' (M3 M2 A2)) 

Constructing an analogous conflict set for the adder circuit is more diflicult. A 
derivation of a contradiction on the first output bit should be analogous to a deriva- 
tion of a contradiction on any other bit. A contradiction on the third output bit, 
however, can depend on more inputs than a contradiction at the first bit. Construct- 
ing the analogous derivation of a contradiction would require more effort than simply 
substituting "equivalent" components for equivalents in the original derivation. We 
leave this as a problem for future research. 

It is interesting to note that while constructing and generalizing analogous deriva- 
tions of contradictions can speed up the learning process, it would not affect perfor- 
mance in the long run. A rule like RIO above can provide some savings the first time 
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Figure 6.2: The two examples allow different derivations that yield the same conflict 
set, (01 Al). 

it is applicable. However, it is only in diagnosing that first case to which it applies 
that any benefit is gained, because RIO would be constructed during the diagnosis 
of that first case if it had it not been constructed previously. Hence, constructing 
and generalizing analogous derivations of conflict sets is an interesting idea, but not 
a useful one for performance learning. 



6.1.2 Same Conclusion; Different Reasoning 

Another way to define two patterns of inferences as similar is if they both lead 
to the same conclusion. This leads to defining two sets of observations as similar 
if the diagnostic engine can reach the same conclusion from, both, perhaps by a 
different line of reasoning. This weaker notion of similarity leads us to try to construct 
a single generalized rule that is applicable when any of the patterns of inferences 
leading to a particular conclusion are applicable. There are two potential efficiency 
advantages to combining the preconditions of several generalized rules that have 
the same conclusions into a single rule. First, it may be possible to coUapse the 
preconditions, making it more efficient to check the single combined rule than aU of 
the individual ones. Second, the program may be able to conclude that it has found 
all of the possible ways to derive a particidar conclusion, so that the combined rule 
provides necessary and sufficient conditions for reaching that conclusion, rather than 
just sufficient conditions. 



Alternate Derivations of the Same Conclusion 

Figure 6.2 shows a situation where it is possible to have two derivations of con- 
tradictions that yield the same conflict set. The value 1 at X can be predicted either 
from a 1 at A or from a 1 at B. Consider now the generalizations resulting from two 
cases. The first case has A=l, B=0, C=l, and D=0. 01 and Al together predict 1 
at D, which is a contradiction. The generahzation is: 
Rll: IF (AID (= ?A 1) 

(lOT (= ?D (AHD 1 ?C))) 
THEH (CONFLICT-SET '(01 Al)) 
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The second case has A=0, B=l, C=l, and D=0. Again there is a contradiction 
at D. The generalization is: 
R12: IF (AID (= ?B 1) 

(HOT (= ?D (AHD 1 ?C))) 
THEH (COHFLICT-SET '(01 Al)) 



Collapsing Preconditions The above two rules can be combined into a single dis- 
junctive rule, whose preconditions can then be collapsed. If both rules are checked in- 
dependently, the predicate (NOT (= ?D (AND 1 ?C) ) ) will be evaluated twice. The 
following combined rule is more efficient to check: 
R13: IF (AHD (OR (= ?A 1) (= ?B 1)) 

(HOT (= ?D (AHD 1 ?C)))) 
THEH (COHFLICT-SET '(01 Al)) 

Hence, by combining the two rules and collapsing the preconditions, the cost of 
checking all of the generalized rules during diagnosis can be reduced, thus improving 
the overall performance. 

Necessary and Sufficient Conditions 

The problem with a rule that gives only sufficient conditions for reaching its 
conclusion is that nothing can be concluded from the failure of the rule to apply. If 
a problem solver were to know that a rule has necessary and sufficient conditions 
for reaching its conclusions, the problem solver would be able to exploit negative 
results from checking the rule as well as positive results. This section discusses how 
the problem solver could exploit necessary and sufficient conditions for constructing 
conffict sets and also how the program might be able to generate them. 

There are two ways that the augmented diagnostic program of Chapter 2 could be 
improved if it knew that it had necessary and sufficient conditions for some conffict 
sets. First, the inapplicability of a rule with necessary and sufficient conditions for 
identifying a conffict set implies that no rule for constructing a conffict set that is a 
subset of the original will ever succeed. If it is not possible to derive a contradiction 
using Ml, M2, and Al, it certainly will not be possible to derive a contradiction using 
only Ml and Al. Thus, having necessary conditions for some rules would enable the 
program to test those rules first; if they fail, the program need not consider rules that 
have smaller conflict sets as their conclusions. 

Second, if the program has necessary and sufficient conditions for alloi the possible 
conflict sets, it is no longer necessary to perform constraint suspension on the suspects 
that are still left after having checked the rules from the conffict set library. This 
improvement would make the algorithm the same as the optimistic algorithm of 
Section 3.9, while still guaranteeing the most specific diagnoses possible. 
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Figure 6.3: Bl and B2 are buffers. The first example allows a derivation that yields 
the conflict set (Bl 01 Al). The second example allows a different derivation that 
yields a different conflict set, (B2 01 Al). 

While a program may often have necessary and sufficient conditions for a par- 
ticular conflict set, it is difficult for the program to know that it has necessary and 
sufficient conditions. There is hope, however, because there can be more than one 
derivation of a conflict set only in unusual circumstances. Consider the characteris- 
tics of the situation in Figure 6.2 that allowed there to be more than one derivation 
of conflict set (01 Al). First, there are behavior rules for 01 that use only one of 
the inputs to predict the output. That makes it possible for one derivation to use 
one input of 01 and another derivation to use a different input. Second, each of Ol's 
inputs is connected to the same set of components (in this case, none). 

As a contrasting example, consider Figure 6.3, in which the two inputs of 01 are 
connected to two different buffers, Bl and B2. As before, the two sets of observations 
allow two different derivations of contradictions at D, and EBG constructs a pair of 
rules with the same left-sides as Rll and R12. In this example, however, the conflict 
sets constructed would now include Bl in the first case and B2 in the second, so the 
two derivations yield different conflict sets. 

If the program can prove that it has seen all of the possible derivations of a 
particular conflict set, it will know that it has necessary and sufficient conditions for 
that conflict set. We hypothesize that the only way that there can be more than one 
derivation of the same conflict set is a situation where both inputs to a component are 
either inputs to the circuit, as in Figure 6.2, or are connected to the same component. 
A program may be able to use this hypothesis to prove that it has found all of the 
possible derivations of a particular conflict set. Future research is required, however, 
to check the validity of the hypothesis. 

6.2 Similarities Between Fault Hypotheses 

As described in the introduction to this chapter, we can define similarity of sets of 
observations in terms of similarities between fault hypotheses. A fault hypothesis is a 



76 CHAPTERS. EXTENSIONS: RELAXING SIMILARITY DEFINITIONS 

proposed misbehavior for a component. We define similarities between fault hypothe- 
ses in two ways. First, they can be similar if they propose the same misbehavior for 
two components playing the same role. Second, they can be similar if they propose 
similar misbehaviors for the same components. 

6.2.1 Same Misbehavior; Role Equivalent Component 

We first consider two fault hypotheses to be similar if they propose the same 
misbehavior for components playing equivalent roles in the circuit. This leads us to 
define two sets of observations as similar if they are both consistent with a particular 
misbehavior on some component playing a particular role. 

In order to recognize such a similarity, we propose that the fault lifting technique 
of Section 5.3 be parameterized. The lifted fault hypotheses for different components 
playing the same role should not be very different. For example, the behavior of the 
carry-chain adder when any of the carry-bits of the component adders is stuck- at one 
can be expressed as correct addition plus an error term. Ideally, one rule could be 
constructed with a parameterized error term; the rule would check simultaneously 
for the fault hypothesis on any of the components playing the same role. Checking 
the parameterized rule would be more efficient than checking a separate rule for each 
component playing the same role. 

6.2.2 Similar Misbehavior; Same Component 

Another way to define fault hypotheses as similar is if they propose similar mis- 
behaviors for the same components. Here, similarity for two misbehaviors means 
that there is some common generalization of the two. We assume that the diagnostic 
engine is given a hierarchy of misbehavior descriptions (sometimes called fault mod- 
els). For example unspecif ied-stuck-at is more general than both stuck-at-1 
and stuck-at-0. This leads to defining two sets of observations as similar if they 
are both consistent with the same general misbehavior for the same component. In 
order to recognize such similarities, the program could create a generalized rule with 
the lifting technique of Section 5.3, using the generalized misbehavior. 

6.3 Summary 

In this chapter we have suggested ways to relax the definitions of similarity used in 
Chapters 2 and 5. Section 6.1 defined similarity of observations in terms of similarity 
between patterns of inferences, then presented two notions of similarity for patterns 
of inferences. Section 6.2 defined similarity of observations in terms of similarity 
between fault hypotheses. Each new definition of similarity led to suggestions for 
how a learning program could recognize and exploit that kind of similarity. 



Chapter 7 
Conclusion 



This thesis has described and analyzed knowledge-rich techniques for learning from 
model-based diagnostic examples. One contribution of the research is the demonstra- 
tion that, using domain knowledge, it is possible to construct useful generalizations 
based on more than one kind of similarity. A second contribution is a detailed analysis 
of the performance of a program that constructs generalizations based on the patterns 
of inference that lead to predictions of contradictory values. A final contribution is 
the analysis of the sources of power in the use of Explanation- Based Generalization, 
one technology for constructing generalizations. 

7.1 Draining As Much As Possible From One 
Example 

The main thrust of this research has been to use domain knowledge to drain as 
much information as possible out of a single example. A program can drain more out 
of an example if it uses more domain knowledge. For example, using only the models 
of correct component behavior and the structure of the circuit, the program was able 
to construct generalized rules that recognize conflict sets (Chapter 2). Using infor- 
mation about the probable modes of failure for components, it was able to construct 
generalized rules that encode sufficient conditions for checking the consistency or in- 
consistency of specific fault hypotheses (Chapter 5). Section 5.3 proposed that the 
program might be able to use additional information, a design verification, to con- 
struct efficient rules that give necessary and sufficient conditions for the consistency 
of a fault hypothesis. Finally, Chapter 6 proposed ways to use information about 
role equivalence and fault hierarchies to drain even more from a single example. 

While the thesis shows that much can be drained from a single example, one 
conclusion of the analysis in Chapter 4 is that not all of the information needed for 
performance learning can be obtained by looking at isolated examples. Information 
about the distribution of examples may be crucial to deciding what to remember 
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about the problem solver's past experience. 



7.2 Multiple Knowledge-based Notions of Similarity 

Figure 7.1 summarizes the definitions of similarity that we have proposed. Note 
the recursive nature of the definitions. For example, similarity of sets of observations 
is defined in terms of similarity of fault hypotheses, which is defined in terms of 
similarity of components. Eventually, such recursive definitions must bottom out 
either in a strict equality test (e.g., same component) or in some equivalence test 
that is supplied to the system (e.g., role equivalence for components). 

AU of the definitions of similarity proposed in this thesis are knowledge-based. 
That is, a program needs a model of the device, plus perhaps models of how it can 
fail, in order to construct generalizations based on those definitions of similarity. 

Inductive learning techniques do not use knowledge-based notions of similarity to 
guide generalization. Instead, inductive generalization algorithms are provided with 
an explicit inductive bias to guide generalization. For example, using version spaces 
([Mit82]) or a Valiant-style algorithm ([Val84]), the learning system is given a target 
language in which to construct generalizations. Typically, the target language is a 
restricted class of boolean combinations of surface features. In order to construct 
a generalization such as (= ?F (+ (* ?A ?C) (* ?B ?D) )), found in the precondi- 
tions of rule Rl, a purely inductive learner would need to be given a target language 
consisting of all the expressions built using =, +, and * as the operators and the 
device observables as variables. By contrast, the program described in Chapter 2 
does not need an explicit inductive bias to construct the expression (= ?F (+ (* 
?A ?C) (* ?B ?D))). The component behaviors determine the operators that will 
be used, and the way that the components are connected in the device guides the 
construction of the expression. 

The case-based approaches to learning from experience (e.g., [KSSC85]) tradition- 
ally have also considered surface notions of similarity rather than knowledge-based 
notions of similarity. The typical case-based problem solver indexes cases by the 
primitive features used to describe cases (e.g., the observed values for the circuit.) 
In solving a new problem, the problem solver retrieves "similar" cases from memory, 
where similarity is a metric on the surface features, typically a weighted sum of the 
shared features. In contrast, EBG and lifting allow a program to recognize similari- 
ties using composite features constructed during the generalization process. Recent 
work on case-based reasoning has also explored mechanisms like EBG to construct 
composite features, which are then used to index cases [Kot88, BM88]. 
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Same Observations 

I 

Similar Observations 




Same Inferences 
(2, 5.2) 

Similar Inferences 




Using Role Equivalent 
Components (6.2.1) 

Same Conclusion 
(6.2.2) 



Same Fault Hypothesis 
(5) I 

Similar Fault Hypothesis 




Similar Misbehavior 
(6.3.2) 



Role Equivalent 
Component (6.3.1) 



Figure 7.1: The definitions of similarity proposed in this thesis. Moving to the right 
and down indicates definitions that classify more cases as similar. 
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7.3 Finding Useful Definitions of Similarity 

This research provides a case-study in finding useful grounds for similarity. We 
propose that the best approach is first to identify similarities that the problem solver 
can exploit, then to seek generalization mechanisms that can make those similarities 
manifest. For example, classifying cases by their conflict sets was appealing initially 
because the original diagnostic engine constructed conflict sets as intermediate results 
during diagnosis. EBG then provided a mechanism for making conflict sets manifest 
in the device's inputs ajid outputs. 

Another contribution of this thesis is a detailed performance analysis of a learn- 
ing system that generalizes based on one notion of similarity. Chapter 3 presented 
experimental results from a program that uses EBG to encapsulate patterns of infer- 
ences leading to the construction of conflict sets. Single-fault candidate generation 
speed improved on both the polybox circuit and a gate-level implementation of a 
carry-lookahead adder. Analysis of the learning system identified three device char- 
acteristics that influence the utility of that use of EBG: 

• If only a few of the components account for all of the failures, then only a few 
generalized rules will be constructed, which will keep down the cost of checking 
the generalized rules. 

• If the component behavior is inexpensive to compute, the savings in overhead 
costs will outweigh the computation of additional component behaviors. 

• If the device topology is such that conflict sets tend to have few components 
in common, the benefits of the generalized rules will be high when they are 
applicable. 

7.4 The Sources of Power in EBG 

Since EBG is used throughout the thesis as a technology for constructing gener- 
alizations. Chapter 4 analyzed the sources of power of that technology. It examined 
two common uses of EBG to improve performance: generalizing successful problem 
solving episodes, and generalizing the explanations that search nodes are inconsistent. 

7.4.1 Using EBG to Encapsulate Patterns of Inferences Leading to a Goal 
State 

There are two sources of power in using EBG to generalize successful problem 
solving episodes. First, the generalized rules can act as remembered patterns of op- 
erator applications, to bias the problem solver's search toward patterns that have 
been useful in solving previous problems, and away from patterns that have never 
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been useful. Second, the generalized rules encapsulate patterns of operator appli- 
cations: the program can check the preconditions of the whole pattern and jump 
to the conclusions without incurring the overhead costs of binding variables for the 
operators and storing intermediate results. The following highlight key observations 
from our analysis: 

Biasing Search EBG biases the problem solver's search toward every pattern of 
operator applications that ever led to a goal state, regardless of how frequently 
a pattern did so. The bias will be effective to the extent that many patterns 
never lead to a goal state, which may happen either as an accident of the 
distribution of cases presented to the problem solver, or because the nature of 
the task ensures that some legal patterns of operator applications never lead to 
a goal state. 

Encapsulation Using a generalized rule involves all of the computation necessary 
to evaluate the bodies of the encapsulated operators, but not the computation 
necessary to trigger the operators and store their results. In addition, some 
operator applications may be encapsulated in several generalized rules. Hence, 
the benefits from encapsulation depend on the relative cost of evaluating the 
bodies of search operators versus the cost of binding variables for the operators' 
left-hand sides and storing the results of operator applications. 

Caveat: Searching For All Solutions If the problem solver's task is to find all of 
the solution states, using EBG to identify single solution states will not improve 
performance. Unless the program knows that its generalized rules provide an 
exhaustive enumeration of the legal search paths, it will have to explore the 
whole search space for solutions that the generalized rules failed to identify. 

7.4.2 Using EBG to Identify Inconsistent Search Nodes 

There are two potential sources of power in using EBG to generalize explanations 
of the inconsistency of search nodes. First, finding the inconsistency of a search node 
may be very expensive, and recognizing the applicability of a previously successful 
derivation of an inconsistency may reduce that cost. In this case, generaUzing ex- 
planations of failures is the same as generalizing successful patterns of inferences in 
the space of derivations of inconsistencies. Performance may improve due to either 
search bias or encapsulation, or both. 

Second, knowing the inconsistency of one search node may enable the problem 
solver to ignore a large portion of the original search space. The problem solver 
may cut off search either below or above the search nodes that the generalized rules 
identify as inconsistent. 
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Cutting Off Below Inconsistent Nodes If goal nodes are never reached from in- 
consistent nodes, the problem solver can cut off search at a node that a gen- 
eralized rule identifies as inconsistent. One must be careful in measuring these 
gains, however, because a well-designed original problem solver may be able to 
cut off search below inconsistent nodes even without the generalized rules. 

Cutting Off Above Inconsistent Nodes The problem solver may be able to com- 
bine information provided by more than one generalized rule to cut off search 
above the nodes that the generalized rules identify as inconsistent. One ex- 
ample of this is the use of the single-fault assumption in diagnosis to intersect 
contexts (sets of components) that the generalized rules identify as inconsistent. 

The two items above explain why our use of EBG to generalize conflict set deriva- 
tions can improve single-fault candidate generation performance but can not improve 
multiple-fault candidate generation. The monotonicity of inconsistency of contexts 
enables a candidate generator to cut off search below inconsistent nodes, whether 
they are identified using EBG or propagation of values, so using EBG to cut off 
search below inconsistent nodes does not speed up candidate generation. With the 
single-fault assumption, however, the program can intersect inconsistent contexts, so 
that it cuts off search above the contexts that generalized rules identify as inconsis- 
tent. This allows the single-fault candidate generator to consider significantly fewer 
contexts (and hence propagate fewer values) when it uses EBG. 

7.5 Conclusion 

This thesis examined ways to use domain knowledge to learn as much as possible 
from single examples. We suggested that there are many kinds of similarity between 
diagnostic examples, and that each kind of similarity provides an opportunity for 
learning. One should be careful, however, to select only those opportunities that 
wiU actually improve a problem solver's performance. Hence, we presented some 
experimental results and analyzed the factors that determine the performance effects 
of our learning methods. 



Appendix A 

A Circuit With an Exponential 
Number of Conflict Sets 



It is difficult to characterize the class of circuits for which there wiU be only a few 
patterns of behavior rule firings that can predict contradictory values. There are 
potentially an exponential number of possible conflict sets (every subset of the com- 
ponents is a potential conflict set), but many circuits wiU not have nearly that many. 
For example, there are only three possible conflict sets for the polybox circuit. Some 
readers have interpreted the characterization of circuits with few conflict sets in 
[dKW87] as those that are "weakly connected" to mean circuits with few wires. 
That interpretation cannot be correct, and this appendix forces a clarification of the 
characterization, by demonstrating a class of circuits with low connectivity which 
can produce a number of minimal conflict sets exponential in the square root of the 
number of circuit components. 

Figure A.l gives a schematic representation of a binary tree with k alternating 
layers of AND-gates and OR-gates. It has n = 2*^ - 1 components. Assume that all 
of the inputs are 1, but the output is observed to be 0. To deduce the value 1 at the 
output of an AND gate at depth j, both of its inputs need to be 1. Let CandU, v) be 
the number of different sets of components that can predict the value v at the output 
of an AND-gate at depth j. Any combination of the support sets for predicting 1 at 
depth j-1 can be used to predict a 1 at depth j. Hence, CandU, 1) = CorU - 1, 1)^- 
But to deduce the value 1 at depth j-1, only one of the inputs to the OR-gate must 
be 1, so CorU - 1, 1) = 2{CANDiJ - 2, 1)). Hence, CandU, 1) = (^CandU " 2, 1))'. 

A solution for this recurrence is CandU^ 1) = 2^^"^'-*. The derivation below verifies 
the solution by induction: 

(2CANDU^)y = {2-2'^"'-'y 
83 
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Figure A.l: AND/OR tree; inputs filter through alternating layers of AND and OR 
gates. The number of possible conflict sets is exponential in the square root of the 
number of components 
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Appendix B 

More Experimental Results 



Chapter 3 presented results from a learning program that used EBG to generalize 
derivations of conflict sets. In that chapter, just one experiment was reported for 
the polybox circuit and one for the adder circuit. In this appendix, we report the 
results from several other experiments, with different choices of training and test sets, 
with different size training and test sets, and with a different implementation of the 
original diagnostic engine. 

The first four lines of each summary correspond to the four runs described in 
Chapter 3. In the first run, the program diagnosed each of the training examples 
without using or constructing any generalized rules. In the second run, the training 
examples were diagnosed again, this time constructing generalized rules and using 
them on later examples. Note that the times reported are only for using the gener- 
alized rules, not for constructing them. In the third run, each of the test cases was 
diagnosed, without using or constructing any generalized rules. In the fourth run, 
each of the test cases was diagnosed, using all of the generalized rules constructed 
during the second run. 

The last two lines present additional information. The first line gives the time 
necessary to perform constraint suspension on each of the final candidates for the test 
cases. It is a measure of the lower bound on diagnosis time described in Section 2.3. 
The last line of each sumimary measures the precision for speed tradeoff described in 
Section 3.9. It gives the time taken to check the generalized rules, and the number 
of candidates produced if the program does not faU back on the model to identify 
additional conflict sets. 
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